VARAG  by adithya-s-k

Vision-first RAG engine for multimodal document understanding

created 1 year ago
477 stars

Top 64.9% on sourcepulse

GitHubView on GitHub
Project Summary

VARAG is a vision-first Retrieval-Augmented Generation (RAG) engine designed for users needing to integrate visual and textual data for enhanced information retrieval. It offers a flexible abstraction layer to experiment with various RAG techniques, including text, image, and multimodal document retrieval, simplifying the evaluation of different approaches for diverse use cases.

How It Works

VARAG integrates vision-language models to embed both visual and textual data into a shared vector space, enabling cross-modal similarity searches. It supports several retrieval methods: Simple RAG with OCR for text-heavy documents, Vision RAG using cross-modal embeddings for text-image correlation, ColPali RAG which embeds document pages as images for visual-aware retrieval, and Hybrid ColPali RAG combining image embeddings with ColPali's late interaction for re-ranking. This modular design, inspired by Byaldi, uses LanceDB for vector storage, facilitating rapid experimentation.

Quick Start & Requirements

  • Install: Clone the repository, create a Conda environment (conda create -n varag-venv python=3.10), activate it (conda activate varag-venv), and install dependencies (pip install -e . or poetry install). OCR dependencies can be installed with pip install -e .["ocr"].
  • Demo: Run python demo.py --share for an interactive playground.
  • Prerequisites: Python 3.10, Conda.

Highlighted Details

  • Supports text, image, and multimodal retrieval.
  • Implements ColPali RAG, leveraging PaliGemma for image-based document page embedding and late interaction.
  • Integrates OCR via Docling for scanned documents.
  • Uses LanceDB as the vector store for ease of use and customizability.

Maintenance & Community

The project is open for contributions. Contact is available via email at adithyaskolavi@gmail.com.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and modification.

Limitations & Caveats

The project is presented as an experimental framework for evaluating RAG techniques rather than a production-ready library. Specific performance benchmarks or detailed comparisons between the implemented techniques are not explicitly provided in the README.

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
1
Star History
30 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.