SubwordEncoding CWS Save

Subword Encoding in Lattice LSTM for Chinese Word Segmentation

Project README

Subword Encoding in Lattice LSTM for Chinese Word Segmentation

Subword encoding for Word Segmentation using Lattice LSTM.

Models and results can be found at our paper Subword Encoding in Lattice LSTM for Chinese Word Segmentation.

Requirement:

Python: 2.7   
PyTorch: 0.3.0

Input format:

CoNLL format (prefer BMES tag scheme), with each character its label for one line. Sentences are splited with a null line.

中 B-SEG
国 E-SEG
最 B-SEG
大 E-SEG
氨 B-SEG
纶 M-SEG
丝 E-SEG
生 B-SEG
产 E-SEG
基 B-SEG
地 E-SEG
在 S-SEG
连 B-SEG
云 M-SEG
港 E-SEG
建 B-SEG
成 E-SEG

新 B-SEG
华 M-SEG
社 E-SEG
北 B-SEG
京 E-SEG
十 B-SEG
二 M-SEG
月 E-SEG
二 B-SEG
十 M-SEG
六 M-SEG
日 E-SEG
电 S-SEG

Pretrained Embeddings:

The pretrained character and word embeddings are the same with the embeddings in the baseline of RichWordSegmentor

Character embeddings (gigaword_chn.all.a2b.uni.ite50.vec): Google Drive or Baidu Pan
Character bigram embeddings (gigaword_chn.all.a2b.bi.ite50.vec): Google Drive or Baidu Pan
Word embeddings (ctb.50d.vec): Google Drive or Baidu Pan
Subword(BPE) embeddings: zh.wiki.bpe.op200000.d50.w2v.txt

How to run the code?

Download the character embeddings, character bigram embeddings, BPE (or word) embeddings and set their directories in main.py.
Modify the run_seg.py by adding your train/dev/test file directory.
sh run_seg.py

Cite:

Cite our paper as:

@article{yang2019subword,  
 title={Subword Encoding in Lattice LSTM for Chinese Word Segmentation},  
 author={Jie Yang, Yue Zhang, and Shuailong Liang},  
 booktitle={NAACL},
 year={2019}  
}

Open Source Agenda is not affiliated with "SubwordEncoding CWS" Project. README Source: jiesutd/SubwordEncoding-CWS

Stars

Open Issues

Last Commit

5 years ago

Repository

jiesutd/SubwordEncoding-CWS

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/subwordencoding-cws"><img src="https://www.opensourceagenda.com/projects/subwordencoding-cws/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022