FastAPI service for semantic text search using precomputed embeddings
Top 37.3% on sourcepulse
This project provides a FastAPI service for semantic text search and LLM operations, targeting developers and researchers who need to integrate local LLMs into applications. It offers a unified API for text embedding, completion, and advanced similarity measures, simplifying complex LLM workflows.
How It Works
The service leverages llama_cpp
for LLM inference and textract
for broad file format support, including OCR for scanned documents and Whisper for audio transcription. Embeddings are cached in SQLite for efficiency. It utilizes a Rust library, fast_vector_similarity
, for advanced similarity metrics beyond standard cosine similarity, enabling a two-step search process: initial FAISS filtering followed by more nuanced similarity calculations. Multiple embedding pooling methods are supported to create fixed-length vectors from token embeddings.
Quick Start & Requirements
sudo ./setup_dockerized_app_on_fresh_machine.sh
(installs Docker and the app)build-essential
, libxml2-dev
, libxslt1-dev
, antiword
, unrtf
, poppler-utils
, pstotext
, tesseract-ocr
, flac
, ffmpeg
, lame
, libmad0
, libsox-fmt-mp3
, sox
, libjpeg-dev
, swig
, redis-server
, libpoppler-cpp-dev
, pkg-config
.requirements.txt
lists dependencies including llama-cpp-python
, fastapi
, faster-whisper
, faiss-cpu
, fast_vector_similarity
, redis
.http://localhost:8089
(Swagger UI)Highlighted Details
Maintenance & Community
The project is actively maintained by the author. Community contributions are encouraged. Links to commercial web apps by the author are provided.
Licensing & Compatibility
Limitations & Caveats
The setup script installs Docker via apt
, which might not be ideal for all environments. RAM disk setup requires specific sudo
permissions and configuration. The project relies heavily on external dependencies, including system libraries and specific Python packages.
5 months ago
1 day