Research paper code for late chunking (chunked pooling) in embedding models
Top 70.5% on sourcepulse
This repository provides the implementation for "Late Chunking," a technique designed to improve the performance of Retrieval Augmented Generation (RAG) systems by addressing the challenge of long-distance contextual dependencies in text. It is targeted at developers and researchers working with RAG and large language models who need to enhance information retrieval accuracy for documents that span multiple text chunks.
How It Works
Late Chunking leverages the extended context windows of modern embedding models (e.g., 8192 tokens) by first processing larger segments of text. Instead of chunking text before embedding, it embeds the entire text (or a large portion) to generate token-level vector representations. Then, it applies mean pooling to smaller segments of these token vectors to create chunk embeddings. This approach allows embeddings for smaller chunks to incorporate information from the entire document, significantly improving the retrieval of semantically related text, especially when anaphoric references are present across chunks.
Quick Start & Requirements
pip install .
python3 run_chunked_eval.py --task-name {TASK_NAME}
(tasks include "SciFactChunked", "TRECCOVIDChunked", etc.)jina-embeddings-v2-small-en
model.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README does not specify the license, which is a critical factor for adoption, especially in commercial or closed-source environments. While late chunking generally improves retrieval, the "no chunking" approach sometimes yields better results, particularly for datasets with shorter documents or when ranking individual chunks is not the primary goal.
7 months ago
1 day