`

ICTCLAS的java接口的Bug

阅读更多

ICTCLAS的java接口中有这个方法:

/**
对一串中文文本进行分词
*/ 
public synchronized native String paragraphProcess(String sParagraph);

  大多数情况下该方法可以对传入的文本进行分词操作,但是对于一些特殊字符会抛出异常,比如如下的字符:

    

String str="[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][下一页]";
 

 会打印如下的异常:

         

A fatal error has been detected by the Java Runtime Environment: 
# 
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x3ae6c4e4, pid=2804, tid=2756 
# 
# JRE version: 6.0_22-b04 
# Java VM: Java HotSpot(TM) Client VM (17.1-b03 mixed mode windows-x86 ) 
# Problematic frame: 
# C  [ICTCLAS.dll+0xc4e4] 
# 
# An error report file with more information is saved as: 
# D:\yourproject\hs_err_pid2804.log 
# 
# If you would like to submit a bug report, please visit: 
#   http://java.sun.com/webapps/bugreport/crash.jsp 
# The crash happened outside the Java Virtual Machine in native code. 
# See problematic frame for where to report the bug. 
# [error occurred during error reporting , id 0xc0000005] 

 

原因:这是ICTCLAS.dll的异常,所以java中的try catch块无法截获,致使jvm(Java虚拟机强行关闭)。

解决办法:当使用ICTCLAS进行分词前,最好对文本进行一些预处理(如去除多余空格、文本不要太长等)。

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics