Awesome List of Tamil NLP & AI Resources
A curated catalog of open-source resources for Tamil NLP & AI.
The estimated worldwide Tamiḻ-speaking population is around 80-85 million, which is near to the population of Germany. Hence it is crucial to work on natural language processing for தமிழ் (Tamiḻ) and develop tools inorder to ensure the language is digitally well-represented.
This list will serve as a catalog for all resources related to Tamil NLP.
Note:
Also check Ezhil Foundation's Awesome-Tamil for lot more resources!
Note: You can also use the MTData library to automatically download parallel data from many of the above sources.
IndicGLUE Classification Benchmark
Offensive Language Identification in Dravidian Languages - {2020, Dataset}
Natural Language Inference
Dialogue
Information Extraction
(Can also be event extraction or entity extraction)
Misc
Reasoning
MorphAnalysis
Pure Tamil