Local LLM and chatbot setup for consumer hardware
Top 95.3% on sourcepulse
This project provides a framework for setting up and running local Large Language Models (LLMs) on consumer-grade hardware, offering a ChatGPT-like web interface. It targets users who want to experiment with LLMs locally without requiring high-end GPUs, enabling features like web summarization and news aggregation.
How It Works
TinyLLM acts as an orchestrator, allowing users to choose between three popular LLM inference servers: Ollama, vLLM, or llama-cpp-python. These servers provide an OpenAI-compatible API, which the project's FastAPI-based chatbot then interfaces with. The chatbot supports Retrieval Augmented Generation (RAG) features, enabling it to summarize URLs, fetch current news, retrieve stock prices, and get weather information.
Quick Start & Requirements
git clone https://github.com/jasonacox/TinyLLM.git
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Ollama and llama-cpp-python backends currently support only one session/prompt at a time. vLLM requires more VRAM as it typically uses non-quantized models, though AWQ models are available. Some models, like Mistrallite, are noted as potentially glitchy.
1 month ago
1 week