Offline handwritten mathematical expression regnition via stroke extraction and MyScript
The repository provides a proof-of-concept stroke extractor that can extract strokes from clean bitmap images. The stroke extractor can be used to recognize offline handwritten mathematical expression if an online recognizer is given. For example, when combined with MyScript, the resulting offline recognition system was ranked #3 in the offline task in CROHME 2019.
Dataset | Correct | Up to 1 error | Up to 2 errors | Structural correct |
---|---|---|---|---|
CROHME 2014 | 58.22% | 71.60% | 75.15% | 77.38% |
CROHME 2016 | 65.65% | 77.68% | 82.56% | 85.00% |
CROHME 2019 | 65.22% | 78.48% | 83.07% | 84.90% |
Although good accuracy is achieved on datasets of CROHME, the program may produce poor results on real world images. For example, the procedure does not work well on the following images:
In order to use the MyScript Cloud recognition engine, you need to create a account and create an application.
java -jar mathocr-myscript.jar
Image file
from the menu Recognize
Recognize
under stroke previewFirst add the jar file to classpath. If you are using Maven, add the following
to you pom.xml
:
<dependency>
<groupId>com.github.chungkwong</groupId>
<artifactId>mathocr-myscript</artifactId>
<version>1.1</version>
</dependency>
Then you can recognize images of mathematical expression by using code like:
String applicationKey="your application key for MyScript";
String hmacKey="hmac key of your Myscript account";
String grammarId="an uploaded grammar of your Myscript account";
int dpi=96;
MyscriptRecognizer myscriptRecognizer=new MyscriptRecognizer(applicationKey,hmacKey,grammarId,dpi);
Extractor extractor=new Extractor(myscriptRecognizer);
File file=new File("Path to file to be recognized");
EncodedExpression expression=extractor.recognize(ImageIO.read(file));
String latexCode=expression.getCodes(new LatexFormat());
The idea used is explained in the article Stroke extraction for offline handwritten mathematical expression recognition , which is available at arXiv. You can cite the article using the following BibTex code:
@misc{1905.06749,
Author = {Chungkwong Chan},
Title = {Stroke extraction for offline handwritten mathematical expression recognition},
Year = {2019},
Eprint = {arXiv:1905.06749},
}
本项目提供一个可从清晰的图片中还原笔划信息的程序原型。与联机手写数学公式识别结合的话, 可以打造出脱机数学公式识别系统。例如与MyScript结合时 在CROHME 2019的脱机任务中位列第3名。
数据集 | 正确 | 至多一处错误 | 至多两处错误 | 结构正确 |
---|---|---|---|---|
CROHME 2014 | 58.22% | 71.60% | 75.15% | 77.38% |
CROHME 2016 | 65.65% | 77.68% | 82.56% | 85.00% |
CROHME 2019 | 65.22% | 78.48% | 83.07% | 84.90% |
虽然在CROHME数据集上取得了良好的表现,本程序对现实世界中的图片表现仍然可能未如理想。 例如对以下类型的图片可能给出差劲的结果:
如果使用MyScript Cloud作为联机手写数学公式识别器,请注册一个帐号并创建一个应用。
java -jar mathocr-myscript.jar
运行JAR文件识别
中选择图片文件
识别
按钮(首次使用时需要输入你的MyScript Cloud应用标识和密钥)首先把JAR文件加到类路径。如果你使用Maven,把以下依赖加到pom.xml
中dependencies
下即可(其它构建系统类似):
<dependency>
<groupId>com.github.chungkwong</groupId>
<artifactId>mathocr-myscript</artifactId>
<version>1.1</version>
</dependency>
然后你可以使用以下样子的代码识别脱机手写数学公式:
String applicationKey="your application key for MyScript";
String hmacKey="hmac key of your Myscript account";
String grammarId="an uploaded grammar of your Myscript account";
int dpi=96;
MyscriptRecognizer myscriptRecognizer=new MyscriptRecognizer(applicationKey,hmacKey,grammarId,dpi);
Extractor extractor=new Extractor(myscriptRecognizer);
File file=new File("Path to file to be recognized");
EncodedExpression expression=extractor.recognize(ImageIO.read(file));
String latexCode=expression.getCodes(new LatexFormat());
本项目的描述参见文档 通过笔划提取识别脱机手写数学公式,它可从 arXiv下载。你可以使用以下BibTex代码引用该文:
@misc{1905.06749,
Author = {Chungkwong Chan},
Title = {Stroke extraction for offline handwritten mathematical expression recognition},
Year = {2019},
Eprint = {arXiv:1905.06749},
}