TinyLLM  by jasonacox

Local LLM and chatbot setup for consumer hardware

created 1 year ago
273 stars

Top 95.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a framework for setting up and running local Large Language Models (LLMs) on consumer-grade hardware, offering a ChatGPT-like web interface. It targets users who want to experiment with LLMs locally without requiring high-end GPUs, enabling features like web summarization and news aggregation.

How It Works

TinyLLM acts as an orchestrator, allowing users to choose between three popular LLM inference servers: Ollama, vLLM, or llama-cpp-python. These servers provide an OpenAI-compatible API, which the project's FastAPI-based chatbot then interfaces with. The chatbot supports Retrieval Augmented Generation (RAG) features, enabling it to summarize URLs, fetch current news, retrieve stock prices, and get weather information.

Quick Start & Requirements

  • Install: Clone the repository (git clone https://github.com/jasonacox/TinyLLM.git).
  • Prerequisites: Python 3, CUDA 12.2 (for NVIDIA), 8GB+ RAM, 128GB+ SSD. Recommended GPU: NVIDIA GTX 1060 6GB or better, or Apple M1/M2.
  • Setup: Requires setting up an inference server (Ollama, vLLM, or llama-cpp-python) and then running the chatbot interface.
  • Links: Research

Highlighted Details

  • Supports multiple LLM backends (Ollama, vLLM, llama-cpp-python) for flexibility.
  • Chatbot includes RAG features for summarizing URLs, news, stocks, and weather.
  • Offers an OpenAI API compatible web service for easy integration.
  • Provides detailed instructions for running various LLM models (Mistral, Llama-2, Mixtral, Phi-3) with different quantization levels.

Maintenance & Community

  • The project is actively maintained by jasonacox.
  • References include popular LLM projects like llama.cpp, llama-cpp-python, and vLLM.

Licensing & Compatibility

  • The project itself does not explicitly state a license in the README.
  • LLM models listed have varying licenses (Apache 2.0, MIT, Meta). Compatibility for commercial use depends on the chosen LLM's license.

Limitations & Caveats

Ollama and llama-cpp-python backends currently support only one session/prompt at a time. vLLM requires more VRAM as it typically uses non-quantized models, though AWQ models are available. Some models, like Mistrallite, are noted as potentially glitchy.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
32 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.