RAG engine for unstructured data, excelling on dense text QA
Top 28.7% on sourcepulse
dsRAG is a high-performance retrieval engine designed for complex question-answering over unstructured text, targeting users who need superior accuracy on challenging datasets like financial reports and legal documents. It significantly outperforms vanilla RAG baselines by employing advanced techniques to enhance context and relevance.
How It Works
dsRAG improves retrieval accuracy through three core methods: Semantic Sectioning, which uses an LLM to break documents into semantically cohesive sections with descriptive titles; AutoContext, which prepends these section titles to text chunks to provide richer context to embedding and reranking models; and Relevant Segment Extraction (RSE), a query-time process that intelligently combines relevant chunks into longer segments for improved LLM comprehension. This layered approach aims to reduce irrelevant results and increase the precision of retrieved information.
Quick Start & Requirements
pip install dsRAG
pip install dsRAG[faiss]
, pip install dsRAG[chroma]
, etc., or pip install dsRAG[all-vector-dbs]
for all.OPENAI_API_KEY
, CO_API_KEY
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
5 days ago
1 day