Bert Tokenization For Java Save

This is a java version of Chinese tokenization descried in BERT.

Project README

This is a java version of Chinese tokenization descried in BERT, including basic tokenization and wordpiece tokenization.

Motivation

In production, we usually deploy the BERT related model by tensorflow serving for high performance and flexibility. However, our application may not developed by python. Hence, we have to rewrite the tokenization module.

Usage

Just run Demo.java, you can get result. Now, it support single and pair sentence both.

Moreover, for Chinese natural language processing, we add full turn to half angle and uppercase to lowercase operation.

Reporting issues

Please let me know, if you encounter any problems.

Open Source Agenda is not affiliated with "Bert Tokenization For Java" Project. README Source: zhongbin1/bert_tokenization_for_java
Stars
57
Open Issues
0
Last Commit
1 year ago
License

Open Source Agenda Badge

Open Source Agenda Rating