chonkie  by chonkie-inc

Chunking library for RAG applications

created 4 months ago
1,930 stars

Top 23.1% on sourcepulse

GitHubView on GitHub
Project Summary

Chonkie is a Python library designed for efficient and lightweight text chunking, primarily for Retrieval-Augmented Generation (RAG) applications. It targets developers and researchers seeking a fast, easy-to-use, and resource-minimal solution for splitting text into manageable segments, offering a variety of chunking strategies and broad integration capabilities.

How It Works

Chonkie provides a suite of specialized chunkers, including TokenChunker, SentenceChunker, RecursiveChunker, and SemanticChunker, each employing different strategies for text segmentation. The library emphasizes flexibility by supporting multiple tokenizers (e.g., Hugging Face, Tiktoken) and embedding model providers (e.g., SentenceTransformers, OpenAI), allowing users to tailor the chunking process to their specific needs and existing toolchains. This modular approach aims to deliver high performance and low overhead.

Quick Start & Requirements

  • Primary install: pip install chonkie
  • Install all extras: pip install chonkie[all]
  • Dependencies: Python, with optional integrations for specific tokenizers and embedding models.
  • Documentation: https://docs.chonkie.ai

Highlighted Details

  • Default install size: 15MB, significantly lighter than alternatives.
  • Performance claims: Up to 33x faster for token chunking and 2.5x faster for semantic chunking compared to competitors.
  • Integrations: Supports Hugging Face and Tiktoken, and embedding providers like SentenceTransformers, OpenAI, and Cohere.
  • Multilingual support: Out-of-the-box support for 5+ languages.

Maintenance & Community

  • Community: Active Discord server available at https://discord.gg/rYYp6DC4cv.
  • Development: Open for contributions, with a CONTRIBUTING.md file available.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The library is positioned as "no-nonsense" and "ultra-light," suggesting a focus on core chunking functionality. Advanced features or extensive customization beyond the provided chunkers and integrations may require custom development. The "Chonkie Cloud" offering is mentioned but not detailed in the README.

Health Check
Last commit

21 hours ago

Responsiveness

1 day

Pull Requests (30d)
11
Issues (30d)
10
Star History
1,606 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.