viberary by veekaybee

Semantic search engine for book recommendations

Created 3 years ago

431 stars

Top 68.9% on SourcePulse

View on GitHub

3 Experts Love This Project

Carol Willing

Core Contributor to CPython, Jupyter

Eugene Yan

AI Scientist at AWS

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

Viberary addresses the challenge of book recommendation by moving beyond traditional genre or title-based filtering. It offers a semantic search engine that recommends books based on descriptive "vibes" provided by the user, leveraging learned embeddings from Goodreads data. This approach benefits users seeking nuanced book discovery beyond conventional methods.

How It Works

Viberary employs a two-tower semantic retrieval model, utilizing Sentence Transformers' MSMarco model to encode queries and book data. Embeddings are learned from Goodreads book metadata, with training data generated locally in DuckDB. For efficient, low-latency inference, the model is converted to ONNX format. Corpus embeddings are generated on AWS P3 instances and stored in Redis, enabling retrieval via the Redis Search module using the HNSW algorithm. Results are served through a Flask API, powered by Gunicorn, with a Bootstrap front-end, deployed as a Dockerized application.

Quick Start & Requirements

To run Viberary, fork/clone the repository. Download the model artifacts from https://github.com/veekaybee/viberary_model and the corpus embeddings file. If starting from training data, generate the ONNX model artifact using make onnx. Build the Docker image with make build. Launch the application using Docker Compose with make up-arm (for ARM systems) or make up-intel (for Intel systems). Finally, index the embeddings with make embed. The web server is accessible at localhost:8000. Key prerequisites include Docker, the model artifacts, and corpus embeddings.

Highlighted Details

Vibe-based book recommendations via semantic search.
Two-tower retrieval architecture using Sentence Transformers.
ONNX conversion for low-latency inference.
Redis Search with HNSW for efficient vector indexing and retrieval.
An exploration notebook details an end-to-end workflow using DuckDB, Word2Vec, and Redis Search.

Maintenance & Community

Viberary is currently in maintenance mode. Further details are available on the project's website (viberary.pizza). No specific community channels or contributor information are detailed in the provided README.

Licensing & Compatibility

The provided README does not specify a software license.

Limitations & Caveats

The project is in maintenance mode, indicating limited active development. The README notes that the API is slated for a future rewrite in Go to improve performance. Setup requires manual downloading of model artifacts and corpus embeddings, along with executing multiple make commands and managing Docker containers.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days