API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。
Adding autoBreakBOOL in parse() function, default value is True.
This feature will automatically split your input TEXT into sub-strings shorter than 5000 characters in length. With a punctuation mark at the end, your input TEXT will not be lost. Meanwhile, Articut users don't need to worry about your long text that may cause 408 Request Timeout error.