Discover and explore top open-source AI tools and projects—updated daily.
cltkPython NLP for pre-modern languages
Top 40.9% on SourcePulse
The Classical Language Toolkit (CLTK) is a Python library providing Natural Language Processing (NLP) capabilities specifically for pre-modern languages. It addresses the unique characteristics and research goals of historical languages, which are often overlooked by general NLP frameworks. The CLTK offers a novel software architecture centered around a modular processing pipeline, balancing algorithmic diversity with user-friendly defaults, and currently supports pipelines for nearly 20 languages.
How It Works
CLTK's core design is a modular processing pipeline, enabling flexibility for diverse NLP tasks on historical texts. It adapts established NLP concepts to the specific needs of pre-modern languages, which differ significantly from modern, spoken languages. The toolkit facilitates integration with advanced AI models, offering optional backends for cloud-based GenAI via OpenAI and local Large Language Models (LLMs) through Ollama, enhancing annotation and analytical power.
Quick Start & Requirements
pip install cltkpip install "cltk[openai]", pip install "cltk[stanza]", pip install "cltk[ollama]". Combinations are supported (e.g., cltk[openai,stanza,ollama]).OPENAI_API_KEY environment variable.llama3.1:8b, qwen2.5:14b).v0.1.x (pip install "cltk<1.0").Highlighted Details
Maintenance & Community
No specific details regarding contributors, sponsorships, or community channels (like Discord/Slack) are provided in the README.
Licensing & Compatibility
Licensed under the MIT License. This license is generally permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
The pre-1.0 version of the software is maintained on a separate branch (v0.1.x) and has distinct documentation, indicating potential differences or ongoing development in the main branch.
1 day ago
Inactive
fighting41love