Discover and explore top open-source AI tools and projects—updated daily.
StarTrail-orgVisual RAG for documents and web pages
New!
Top 13.3% on SourcePulse
PixelRAG addresses the limitations of traditional text-based retrieval systems by enabling search and retrieval directly from visual content within documents. It targets engineers, researchers, and power users who need to extract information from visually rich sources like web pages and PDFs, offering a benefit of more accurate and context-aware answers by preserving visual structure.
How It Works
PixelRAG's core innovation lies in rendering documents into screenshot tiles rather than parsing them into text chunks. This approach preserves visual elements such as tables, charts, and layout. A Qwen3-VL-Embedding model, fine-tuned on screenshot data, then embeds these images into a vector space, allowing for retrieval based on visual similarity and content. This method ensures that information embedded in visual structures is not lost during the retrieval process.
Quick Start & Requirements
pip install pixelragpip install pixelrag (includes Playwright/CDP for rendering).pip install 'pixelrag[embed]'.pip install 'pixelrag[serve]'.pip install 'pixelrag[index]'.torch==2.9.1+cu129, transformers==4.57.1, cuDNN 9.20) managed within the train/ directory using uv. GPU is recommended for embedding and training.https://api.pixelrag.aiclaude plugin marketplace add StarTrail-org/PixelRAGhuggingface-cli download StarTrail-org/pixelrag-faiss-indexes --repo-type dataset --include "search_index_normed_v2/*" --local-dir ./indexHighlighted Details
https://api.pixelrag.ai) serving a pre-built index of 8.28 million Wikipedia pages, requiring no setup.pixelbrowse skill), allowing Claude to screenshot pages and interpret visual content directly.Maintenance & Community
Developed by Berkeley SkyLab, BAIR, and the Berkeley NLP Group. Notable contributors include Rulin Shao. Support was provided by Claude Code and OpenAI Codex. No specific community channels (e.g., Discord, Slack) or roadmap links are detailed in the provided README.
Licensing & Compatibility
The project is licensed under the Apache-2.0 license. This license is generally permissive, allowing for commercial use and integration into closed-source applications without significant copyleft restrictions.
Limitations & Caveats
The training environment is managed as a separate uv project within the train/ directory and requires specific, pinned dependencies, potentially complicating setup. The reproducibility status for data curation visualization is marked as "TBD". The Claude plugin executes the pixelshot command locally on the user's machine.
14 hours ago
Inactive
rom1504