IndianNLP Transliteration Save Abandoned

Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with Transformer-based models, check: https://github.com/AI4Bharat/IndicXlit

Project README

IndianNLP-Transliteration

Project Website | Demo UI | Python Library

The main goal of this project is to create open source input tools for content creation in under-represented languages in India.
It started in collaboration with Story Weaver a non-profit working towards foundational literary education for children, supported by Google's AI for Social Good initiative.

Most languages in India do not have digital presence due to an underdeveloped ecosystem. One of the major bottlenecks in content creation and language adoption, is difficulty to input text in several native Indian languages. Lack of stable input tools in underserved languages is huge barrier for creating digital content and NLP datasets in these languages.

Supported Languages

Bengali - বাংলা
Gujarati - ગુજરાતી
Hindi - हिंदी
Kannada - ಕನ್ನಡ
Konkani Goan - कोंकणी
Maithili - मैथिली
Malayalam - മലയാളം
Marathi - मराठी
Panjabi Eastern - ਪੰਜਾਬੀ
Sindhi - سنڌي‎
Sinhala - සිංහල
Telugu - తెలుగు
Tamil - தமிழ்
Urdu - اُردُو

Repository Usage

For Attributions and Contributions lists, check here ?

Training Procedures

This repository is developed to facilate easier experimentation with different network architecture models, reformulated objectives with minimal effort and highly tinkerable, rather than a offshelf library.

A Condensed standalone version of a simple model training, inferencing and accuracy computation is created as jupyter notebook.

Pythonic Library

Pythonic transliteration library is available as Python Package Index and also under github releases.
Follow usages in apps readme.

NeuralNet Models

Transliteration models for languages are made available as releases, in a easy deployable way.

All the NN models (along with metadata) of Xlit - Transliteration are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Datasets

Datasets created as part of the project for languages Maithili, Konkani, Hindi are made available as JSON files under downloads.

Xlit - Transliteration Datasets by Story Weaver & AI4Bharat are licensed under a Creative Commons Attribution 4.0 International License.

Kindly attribute if you use the dataset for your research or products

Contact

If you have benefited by our datasets/models/services or got motivated by our works, we would like to hear from you.

email: [email protected]

Open Source Agenda is not affiliated with "IndianNLP Transliteration" Project. README Source: AI4Bharat/IndicNLP-Transliteration

Stars

Open Issues

Last Commit

2 years ago

License

Apache-2.0

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/indiannlp-transliteration"><img src="https://www.opensourceagenda.com/projects/indiannlp-transliteration/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022