fast-langdetect by LlmKira

Fast language detection powered by FastText

Created 2 years ago

293 stars

Top 90.4% on SourcePulse

Project Summary

fast-langdetect offers an ultra-fast, highly accurate language detection library based on Facebook's FastText. It targets developers needing efficient language identification, providing significant speedups and offline capabilities suitable for high-throughput applications and resource-constrained environments.

How It Works

Leveraging pre-trained FastText models, the library achieves up to 95% accuracy. It provides a memory-friendly 'lite' model for offline use (~45-60 MB RSS) and a more accurate 'full' model (~170-210 MB RSS). An 'auto' mode intelligently falls back to 'lite' upon MemoryError during full model loading.

Quick Start & Requirements

Installation: pip install fast-langdetect
Python Support: 3.9 to 3.13.
Dependencies: No NumPy required.
Resource Footprint: Lite model: ~45-60 MB RSS; Full model: ~170-210 MB RSS. Models download to system temp by default, configurable via FTLANG_CACHE or LangDetectConfig(cache_dir=...).

Highlighted Details

Up to 80x faster than conventional methods.
Up to 95% accuracy.
Offline detection via memory-friendly 'lite' model.
Utilities for mapping BCP-47 codes to display names (using langcodes/pycountry).
Supports loading custom FastText language identification models.

Maintenance & Community

Builds upon zafercavdar/fasttext-langdetect with packaging enhancements. Mentions contributions from @dalf and github@JackyHe398. No specific community channels or active maintenance signals are detailed.

Licensing & Compatibility

Code License: MIT License.
Model License: CC BY-SA 3.0. Redistribution/modification of models requires CC BY-SA 3.0 compliance. Inference usage is unaffected.
Compatibility: MIT license is permissive for commercial use and closed-source linking.

Limitations & Caveats

Accuracy may decrease for very short or excessively long inputs (default max_input_length is 80 chars, truncation logs a warning). 'Auto' mode fallback is solely triggered by MemoryError; other errors propagate. User-provided cache directories must exist beforehand.

fast-langdetect by LlmKira

Explore Similar Projects

Translate-It by iSegaro

Raycast-Easydict by tisfeng

gt.el by lorniu

fast-detect-gpt by baoguangsheng

wtpsplit by segment-any-text

pyctcdecode by kensho-technologies

obs-localvocal by royshil

HunyuanOCR by Tencent-Hunyuan

EasyNMT by UKPLab

illa-helper by xiao-zaiyi

LunaTranslator by HIllya51

LibreTranslate by LibreTranslate