VectorDB-Plugin by BBC-Esq

RAG system for multimodal document Q&A

Created 2 years ago

369 stars

Top 76.3% on SourcePulse

Project Summary

This project provides a plugin for querying diverse document types, including audio and video files, using retrieval augmented generation (RAG). It enables users to obtain more reliable responses from large language models (LLMs) by grounding them in provided data. The tool is designed for users seeking advanced document analysis and LLM interaction capabilities.

How It Works

The system ingests a wide array of file formats (PDFs, Office docs, images, audio, etc.), extracting text, generating image descriptions, and transcribing audio. This processed content is embedded and stored in a vector database. When a question is posed (via text or voice), relevant data chunks are retrieved from the database and fed to a chosen LLM (local, Kobold, LM Studio, or ChatGPT) to generate a contextually grounded answer. An optional text-to-speech feature can vocalize the response.

Quick Start & Requirements

Operating System: Microsoft Windows only.
Dependencies: Python 3.11–3.13, Git, Git LFS, Pandoc, Visual C++ Build Tools.
Installation: Download release, extract, navigate to src, run python setup_windows.py.
Build Tools Install (PowerShell): winget install Microsoft.VisualStudio.2022.BuildTools --silent --accept-source-agreements --accept-package-agreements --override "--wait --quiet --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 --add Microsoft.VisualStudio.Component.Windows11SDK.22621"
Verification: Test-Path "C:\Program Files\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC"
Links: Python 3.11–3.13, Git, Git LFS, Pandoc Releases, Visual C++ Build Tools

Highlighted Details

Supports extensive file types: .pdf, .docx, .txt, .html, .csv, .xls, .xlsx, .rtf, .odt, .png, .jpg, .jpeg, .bmp, .gif, .tif, .tiff, .mp3, .wav, .m4a, .ogg, .wma, .flac.
Enables querying of multimedia content through image description and audio transcription.
Flexible LLM integration options, including local models, KoboldAI, LM Studio, and ChatGPT.

Maintenance & Community

Direct contact available via email (bbc@chintellalaw.com) and Discord (moniker vic49).
Bug reports and feature requests are managed through GitHub issues.

Licensing & Compatibility

License: Not explicitly stated in the README.
Compatibility: Limited to Windows. Commercial use is uncertain due to the unstated license.

Limitations & Caveats

The project is exclusively for Microsoft Windows and does not support other operating systems.
The absence of a specified license poses a significant adoption blocker for commercial or closed-source projects.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

1

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

byetype by devonmochi

Markdown-driven AI voice input and image text extraction

Created 11 months ago

Updated 3 weeks ago

FantasyCopilot by Richasy

AI desktop tool for personal assistance via LLM and high scalability

Created 3 years ago

Updated 2 years ago

LastChat by Cocolalilal

Feature-rich AI assistant app for Android

Created 7 months ago

Updated 1 day ago

ltu by YuanGongND

Audio/speech LLM for perception and understanding, supporting open-ended questions

Created 3 years ago

Updated 2 years ago

JARVIS-AGI by SreejanPersonal

AI voice assistant enabling human-like interaction and task automation

Created 2 years ago

Updated 1 year ago

BiBi-Keyboard by BryceWG

Android keyboard app leveraging LLM and ASR for advanced voice input

Created 9 months ago

Updated 2 days ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera) and

Jeffrey Morgan

Jeffrey Morgan(Cofounder of Ollama).

transcriptionstream by transcriptionstream

Self-hosted service for offline transcription, diarization, and LLM summarization

Created 2 years ago

Updated 5 months ago

second-brain by henrydaum

Desktop RAG app with multimodal AI and hybrid search

Created 9 months ago

Updated 1 day ago

meetingmind by misbahsy

AI app for meeting analysis using Next.js, Langflow, and Groq

Created 1 year ago

Updated 1 year ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

whisper-plus by kadirnar

Speech-to-text toolkit for enhanced audio processing

Created 2 years ago

Updated 2 months ago

mad-professor-public by LYiHub

AI companion for reading papers with a "grumpy professor" persona

Created 1 year ago

Updated 1 year ago

Starred by

Romain Huet

Romain Huet(Head of Developer Experience at OpenAI).

openai-fm by openai

Interactive demo for OpenAI's text-to-speech models

Created 1 year ago

Updated 4 months ago

Feedback? Help us improve.