Discover and explore top open-source AI tools and projects—updated daily.
BBC-EsqRAG system for multimodal document Q&A
Top 78.3% on SourcePulse
This project provides a plugin for querying diverse document types, including audio and video files, using retrieval augmented generation (RAG). It enables users to obtain more reliable responses from large language models (LLMs) by grounding them in provided data. The tool is designed for users seeking advanced document analysis and LLM interaction capabilities.
How It Works
The system ingests a wide array of file formats (PDFs, Office docs, images, audio, etc.), extracting text, generating image descriptions, and transcribing audio. This processed content is embedded and stored in a vector database. When a question is posed (via text or voice), relevant data chunks are retrieved from the database and fed to a chosen LLM (local, Kobold, LM Studio, or ChatGPT) to generate a contextually grounded answer. An optional text-to-speech feature can vocalize the response.
Quick Start & Requirements
src, run python setup_windows.py.winget install Microsoft.VisualStudio.2022.BuildTools --silent --accept-source-agreements --accept-package-agreements --override "--wait --quiet --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 --add Microsoft.VisualStudio.Component.Windows11SDK.22621"Test-Path "C:\Program Files\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC"Highlighted Details
.pdf, .docx, .txt, .html, .csv, .xls, .xlsx, .rtf, .odt, .png, .jpg, .jpeg, .bmp, .gif, .tif, .tiff, .mp3, .wav, .m4a, .ogg, .wma, .flac.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 week ago
Inactive