A collaborative catalog of NLP resources for Indic languages
A Collaborative Catalog of Resources for Indic Language NLP
The Indic NLP Catalog repository is an attempt to collaboratively build the most comprehensive catalog of NLP datasets, models and other resources for all languages of the Indian subcontinent.
Please suggest any other resources you may be aware of. Raise a pull request or an issue to add more resources to the catalog. Put the proposed entry in the following format:
[Wikipedia Dumps](https://dumps.wikimedia.org/)
Add a small, informative description of the dataset and provide links to any paper/article/site documenting the resource. Mention your name too. We would like to acknowlege your contribution to building this catalog in the CONTRIBUTORS list.
Indian language NLP has come a long way. We feature a few resources that are illustrative of the trends in recent times along various axes and point to a bright future.
:raising_hand:Note: Many known resources have not yet been classified into the catalog. They can be found as open issues in the repo.
Benchmarks spanning multiple tasks.
Pointers to language-specific NLP resource catalogs