layra  by liweiphys

Visual RAG system for enterprise document understanding

created 3 months ago
785 stars

Top 45.5% on sourcepulse

GitHubView on GitHub
Project Summary

LAYRA is a visual-first Retrieval-Augmented Generation (RAG) system designed to understand documents holistically, preserving layout, semantics, and graphical elements. It targets researchers and enterprises needing to bridge unstructured document understanding with multimodal AI, offering a next-generation solution beyond traditional OCR-based RAG.

How It Works

LAYRA processes documents using pure visual embeddings, treating each page as a visual artifact rather than a sequence of tokens. This approach, powered by the Colpali project and its colqwen2.5 model, captures layout structure, tabular integrity, and embedded visuals like plots and diagrams. These visual embeddings are stored in Milvus for efficient retrieval, enabling layout-aware question answering. The system utilizes an async-first backend with FastAPI and supports multimodal LLMs like Qwen2.5-VL, with plans for GPT-4o and Claude.

Quick Start & Requirements

  • Install/Run: Clone the repository, set up environment variables (.env, .env.local, gunicorn_config.py), launch dependencies via Docker Compose (milvus-standalone-docker-compose.yml, docker-compose.yml), install Python 3.10.6, install system dependencies (poppler-utils), install Python dependencies (pip install -r requirements.txt), download ColQwen2.5 model weights, initialize MySQL, start backend (gunicorn), and start embedding model server (python model_server.py). Frontend development requires npm install and npm run dev (or build/start).
  • Prerequisites: Python 3.10.6, Git LFS, Milvus, Redis, MongoDB, Kafka, MinIO (via Docker Compose), poppler-utils.
  • Setup Time: Significant setup involving cloning, environment configuration, Docker Compose setup, model downloads, and database initialization.
  • Links: GitHub Repo

Highlighted Details

  • Visual-first RAG without OCR, preserving layout and visual content.
  • Modern frontend (Next.js, TypeScript, TailwindCSS) and async backend (FastAPI, Redis, MySQL, MongoDB, MinIO).
  • Uses Colpali project with colqwen2.5 for visual embeddings stored in Milvus.
  • Supports Qwen2.5-VL, with planned support for GPT-4o, Claude, and Gemini.
  • Currently supports PDF documents; future releases will include Word, PPT, Excel, and images.

Maintenance & Community

  • Project is under active development with a first trial version available.
  • Contact: liweiphys (email: liweixmu@foxmail.com), GitHub: github.com/liweiphys/layra.
  • Roadmap available.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Permissive license suitable for commercial use and closed-source linking.

Limitations & Caveats

  • Currently in active development and supports only PDF documents.
  • Requires significant setup with multiple dependencies and model downloads.
  • Future releases are planned for broader document format support and additional LLM integrations.
Health Check
Last commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
204 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
2 more.

llmware by llmware-ai

0.2%
14k
Framework for enterprise RAG pipelines using small, specialized models
created 1 year ago
updated 1 week ago
Feedback? Help us improve.