ragit by baehyunsol

Git-like RAG pipeline for local knowledge-bases

Created 1 year ago

258 stars

Top 98.0% on SourcePulse

Project Summary

Ragit is a novel RAG (Retrieval-Augmented Generation) framework designed to simplify the creation and sharing of local file-based knowledge bases. It targets developers and researchers seeking an efficient, git-like workflow for managing and querying information. Ragit offers a unique approach by adding titles and summaries to data chunks, enabling easier reranking, and employing a hybrid search strategy that combines AI-generated keywords with TF-IDF scoring, moving beyond traditional vector search. This facilitates quick knowledge base setup and collaborative sharing.

How It Works

Ragit distinguishes itself by treating knowledge bases like Git repositories, allowing users to clone and push them. Its core innovation lies in how it processes and retrieves information: each data chunk is augmented with a title and a summary, which aids AI models in reranking retrieved results more effectively. Instead of relying solely on vector similarity, Ragit employs a hybrid search mechanism. It first uses an AI to extract keywords from a user's query, then performs a TF-IDF search using these keywords. This approach is designed for efficiency and potentially better relevance in certain scenarios. The framework also supports markdown files, including images, and is experimenting with multi-turn conversational capabilities.

Quick Start & Requirements

Primary install: cargo install ragit
Prerequisites: Requires an API key for a language model (e.g., GROQ_API_KEY for groq's llama, or OPENAI_API_KEY for GPT-4o). Model can be configured via rag config --set model <model_name>.
Platform Support: Primarily tested and supported on Linux (x64) and Mac (aarch64). Windows is supported but may have imperfections.
Links: Sample clone URL: https://ragit.baehyunsol.com/sample/ragit.

Highlighted Details

Git-like workflow for knowledge bases: clone, push, init, build.
Hybrid search: AI-generated keywords combined with TF-IDF, bypassing pure vector search.
Chunk enrichment: Automatic title and summary generation for improved reranking.
Markdown support: Handles markdown files, including images.
Experimental multi-turn query support.

Maintenance & Community

No specific details on contributors, community channels (like Discord/Slack), or roadmaps were provided in the README excerpt.

Licensing & Compatibility

The license type and compatibility for commercial use were not specified in the provided README excerpt.

Limitations & Caveats

Windows support is explicitly stated as imperfect. Multi-turn query functionality is experimental and may not be fully stable. The framework's reliance on specific API keys for LLM interaction means external service availability and cost are factors.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days