swiss_army_llama by Dicklesworthstone

FastAPI service for semantic text search using precomputed embeddings

Created 2 years ago

1,039 stars

Top 36.2% on SourcePulse

View on GitHub

3 Experts Love This Project

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Simon Willison

Coauthor of Django

Pawel Garbacki

Cofounder of Fireworks AI

Project Summary

This project provides a FastAPI service for semantic text search and LLM operations, targeting developers and researchers who need to integrate local LLMs into applications. It offers a unified API for text embedding, completion, and advanced similarity measures, simplifying complex LLM workflows.

How It Works

The service leverages llama_cpp for LLM inference and textract for broad file format support, including OCR for scanned documents and Whisper for audio transcription. Embeddings are cached in SQLite for efficiency. It utilizes a Rust library, fast_vector_similarity, for advanced similarity metrics beyond standard cosine similarity, enabling a two-step search process: initial FAISS filtering followed by more nuanced similarity calculations. Multiple embedding pooling methods are supported to create fixed-length vectors from token embeddings.

Quick Start & Requirements

Docker Install: sudo ./setup_dockerized_app_on_fresh_machine.sh (installs Docker and the app)
Native Install: Requires build-essential, libxml2-dev, libxslt1-dev, antiword, unrtf, poppler-utils, pstotext, tesseract-ocr, flac, ffmpeg, lame, libmad0, libsox-fmt-mp3, sox, libjpeg-dev, swig, redis-server, libpoppler-cpp-dev, pkg-config.
Python Dependencies: requirements.txt lists dependencies including llama-cpp-python, fastapi, faster-whisper, faiss-cpu, fast_vector_similarity, redis.
Access: http://localhost:8089 (Swagger UI)
Docs: Swagger UI

Highlighted Details

Supports advanced similarity measures: Spearman's Rho, Kendall's Tau, etc.
Handles diverse file types (PDF, DOCX, images with OCR) and audio (Whisper transcription).
Optional RAM disk usage for faster model loading.
Includes a real-time log viewer accessible via the browser.

Maintenance & Community

The project is actively maintained by the author. Community contributions are encouraged. Links to commercial web apps by the author are provided.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The setup script installs Docker via apt, which might not be ideal for all environments. RAM disk setup requires specific sudo permissions and configuration. The project relies heavily on external dependencies, including system libraries and specific Python packages.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days