Discover and explore top open-source AI tools and projects—updated daily.
chardetPython character encoding and language detection
Top 17.7% on SourcePulse
Summary
chardet 7 is a Python character encoding detector offering high accuracy and speed. It targets developers and users needing to process unknown text data, providing a drop-in replacement for older versions with significant performance gains. Its 0BSD license ensures broad applicability.
How It Works
The library utilizes a 13-stage detection pipeline, incorporating Byte Order Mark (BOM) detection, magic number identification, structural probing, byte validity filtering, and bigram statistical models. Optional mypyc compilation further accelerates processing. This comprehensive approach yields superior accuracy and speed compared to predecessors and competitors.
Quick Start & Requirements
pip install chardetHighlighted Details
Maintenance & Community
chardet 7.x is a 2026 ground-up rewrite by Dan Blanchard, distinct from earlier codebases. Historical commits from the original author are preserved in a separate branch. No specific community channels are listed.
Licensing & Compatibility
Licensed under the permissive 0BSD license, allowing unrestricted commercial and closed-source use.
Limitations & Caveats
chardet 7.x is a complete rewrite, not a derivative of pre-version 7 code. While API-compatible, this architectural divergence may be a factor for users requiring strict code lineage.
15 hours ago
Inactive
noamgat
kensho-technologies
togethercomputer
huggingface
openai