NLP resource catalog for Indic languages
Top 54.8% on sourcepulse
This repository serves as a comprehensive, collaborative catalog of Natural Language Processing (NLP) resources for Indic languages. It aims to consolidate datasets, models, libraries, and evaluation benchmarks, benefiting researchers, developers, and anyone working on NLP for the Indian subcontinent.
How It Works
The catalog is structured by NLP task and resource type, providing links and descriptions for each entry. Contributions are encouraged via pull requests or issues, following a specified format to ensure consistency. It highlights significant advancements and emerging trends in Indic language NLP, such as the rise of large-scale corpora and models supporting a wide range of languages, including low-resource ones.
Quick Start & Requirements
This is a catalog, not a runnable software package. Accessing the resources listed will require individual setup based on each resource's specific requirements.
Highlighted Details
Maintenance & Community
The project is a community effort, with contributions from various institutions and individuals, including AI4Bharat, BUET CSE NLP, and IIT Patna. Users can engage through GitHub issues and pull requests.
Licensing & Compatibility
The repository itself is open-source, but the licensing of individual resources listed within the catalog varies. Users must consult the specific licenses of each dataset, model, or tool they intend to use.
Limitations & Caveats
Many resources are still classified as open issues, indicating that the catalog is a work in progress. The usability and quality of individual resources depend on their respective creators and are not directly managed by this repository.
7 months ago
1 week