Finite-state script normalization and processing utilities
This is an interim pre-release of the data compiled on x86_64 Linux platform. The data consists of FST archives (FARs) in OpenFst format. These can be manipulated using Pynini.
Note: this pre-release does not contain the precompiled FSTs for natural romanization of Brahmic scripts. These will be included in the next release.
For each script family the tarballs contain FST archives (FARs) shown below along with their corresponding sizes (in bytes).
abjad_alphabet_x86_64.tar.gz
: Perso-Arabic abjads:
28409 x86_64/abjad_alphabet/nfc.far
175758 x86_64/abjad_alphabet/reading_norm.far
1532106 x86_64/abjad_alphabet/reversible_roman.far
1530858718 x86_64/abjad_alphabet/visual_norm.far
brahmic_x86_64.tar.gz
: Brahmic abugidas:
60129 fixed.far
2254777 iso.far
668276 nfc.far
244509 reading_norm.far
16231686 visual_norm.far
86528 wellformed.far