cltk  by cltk

Python NLP for pre-modern languages

Created 12 years ago
881 stars

Top 40.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

The Classical Language Toolkit (CLTK) is a Python library providing Natural Language Processing (NLP) capabilities specifically for pre-modern languages. It addresses the unique characteristics and research goals of historical languages, which are often overlooked by general NLP frameworks. The CLTK offers a novel software architecture centered around a modular processing pipeline, balancing algorithmic diversity with user-friendly defaults, and currently supports pipelines for nearly 20 languages.

How It Works

CLTK's core design is a modular processing pipeline, enabling flexibility for diverse NLP tasks on historical texts. It adapts established NLP concepts to the specific needs of pre-modern languages, which differ significantly from modern, spoken languages. The toolkit facilitates integration with advanced AI models, offering optional backends for cloud-based GenAI via OpenAI and local Large Language Models (LLMs) through Ollama, enhancing annotation and analytical power.

Quick Start & Requirements

  • Installation: pip install cltk
  • Optional Extras: Install with specific backends: pip install "cltk[openai]", pip install "cltk[stanza]", pip install "cltk[ollama]". Combinations are supported (e.g., cltk[openai,stanza,ollama]).
  • OpenAI Backend: Requires OPENAI_API_KEY environment variable.
  • Ollama Backend: Requires a running local Ollama server. Models can be specified (e.g., llama3.1:8b, qwen2.5:14b).
  • Documentation: Available at https://docs.cltk.org.
  • Legacy Version: Pre-1.0 software is on branch v0.1.x (pip install "cltk<1.0").

Highlighted Details

  • Specialized NLP for pre-modern languages.
  • Modular pipeline architecture for flexibility.
  • Integration with OpenAI and local LLMs (Ollama).
  • Support for approximately 20 languages.

Maintenance & Community

No specific details regarding contributors, sponsorships, or community channels (like Discord/Slack) are provided in the README.

Licensing & Compatibility

Licensed under the MIT License. This license is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The pre-1.0 version of the software is maintained on a separate branch (v0.1.x) and has distinct documentation, indicating potential differences or ongoing development in the main branch.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
23
Issues (30d)
12
Star History
2 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.