viberary  by veekaybee

Semantic search engine for book recommendations

Created 3 years ago
429 stars

Top 69.0% on SourcePulse

GitHubView on GitHub
Project Summary

Viberary addresses the challenge of book recommendation by moving beyond traditional genre or title-based filtering. It offers a semantic search engine that recommends books based on descriptive "vibes" provided by the user, leveraging learned embeddings from Goodreads data. This approach benefits users seeking nuanced book discovery beyond conventional methods.

How It Works

Viberary employs a two-tower semantic retrieval model, utilizing Sentence Transformers' MSMarco model to encode queries and book data. Embeddings are learned from Goodreads book metadata, with training data generated locally in DuckDB. For efficient, low-latency inference, the model is converted to ONNX format. Corpus embeddings are generated on AWS P3 instances and stored in Redis, enabling retrieval via the Redis Search module using the HNSW algorithm. Results are served through a Flask API, powered by Gunicorn, with a Bootstrap front-end, deployed as a Dockerized application.

Quick Start & Requirements

To run Viberary, fork/clone the repository. Download the model artifacts from https://github.com/veekaybee/viberary_model and the corpus embeddings file. If starting from training data, generate the ONNX model artifact using make onnx. Build the Docker image with make build. Launch the application using Docker Compose with make up-arm (for ARM systems) or make up-intel (for Intel systems). Finally, index the embeddings with make embed. The web server is accessible at localhost:8000. Key prerequisites include Docker, the model artifacts, and corpus embeddings.

Highlighted Details

  • Vibe-based book recommendations via semantic search.
  • Two-tower retrieval architecture using Sentence Transformers.
  • ONNX conversion for low-latency inference.
  • Redis Search with HNSW for efficient vector indexing and retrieval.
  • An exploration notebook details an end-to-end workflow using DuckDB, Word2Vec, and Redis Search.

Maintenance & Community

Viberary is currently in maintenance mode. Further details are available on the project's website (viberary.pizza). No specific community channels or contributor information are detailed in the provided README.

Licensing & Compatibility

The provided README does not specify a software license.

Limitations & Caveats

The project is in maintenance mode, indicating limited active development. The README notes that the API is slated for a future rewrite in Go to improve performance. Setup requires manual downloading of model artifacts and corpus embeddings, along with executing multiple make commands and managing Docker containers.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
14 more.

BIG-bench by google

0.1%
3k
Collaborative benchmark for probing and extrapolating LLM capabilities
Created 4 years ago
Updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

text-to-text-transfer-transformer by google-research

0.1%
6k
Unified text-to-text transformer for NLP research
Created 6 years ago
Updated 6 months ago
Feedback? Help us improve.