Discover and explore top open-source AI tools and projects—updated daily.
amscottiLocal LLM inference with RAG for document Q&A
Top 99.8% on SourcePulse
Summary
This project offers an experimental sandbox for deploying local Large Language Models (LLMs) via Ollama, implementing Retrieval-Augmented Generation (RAG) for question answering against user documents. It targets developers exploring private, on-premises LLM applications, featuring a Streamlit UI for intuitive interaction. The core benefit is enabling RAG workflows locally, enhancing data privacy and control.
How It Works
The system orchestrates a local RAG pipeline using Ollama for LLM inference and embedding generation (nomic-embed-text). Langchain manages the data flow: documents (PDFs, Markdown) are ingested, embedded, and stored in ChromaDB. User queries trigger retrieval of semantically similar document chunks from Chroma. These chunks, combined with the query, are fed to the local LLM for contextually relevant answers. A Streamlit application provides a graphical interface for model selection and document directory management.
Quick Start & Requirements
uv sync for dependencies.uv app.py -m <model_name> -p <path_to_documents> (defaults: mistral model, Research directory). Optionally specify embedding model with -e <embedding_model_name> (defaults: nomic-embed-text).uv streamlit run ui.py.Highlighted Details
Maintenance & Community
The README provides no specific details on maintainers, community support channels (e.g., Discord, Slack), or a public roadmap.
Licensing & Compatibility
The project's license type and compatibility restrictions for commercial use or integration with closed-source projects are not specified in the README.
Limitations & Caveats
This repository is explicitly labeled an "experimental sandbox." Embeddings are reloaded on each application run, a noted inefficiency implemented solely for testing.
8 months ago
Inactive