lancedb by lancedb

Embedded retrieval engine for multimodal AI

Created 2 years ago

8,457 stars

Top 6.1% on SourcePulse

View on GitHub

13 Experts Love This Project

Chang She

Cofounder of LanceDB

Carol Willing

Core Contributor to CPython, Jupyter

Ji Yichao

Cofounder of Manus

Luis Capelo

Cofounder of Lightning AI

and 9 more!

Project Summary

LanceDB is an open-source, embedded retrieval engine designed for multimodal AI applications, simplifying the management and querying of embeddings. It targets developers building AI-powered applications who need efficient vector search, filtering, and data management without managing separate server infrastructure. The primary benefit is a serverless, production-scale vector search capability that handles diverse data types and integrates seamlessly with popular AI frameworks.

How It Works

LanceDB is built on Lance, a Rust-based columnar format optimized for ML workloads. This architecture enables zero-copy data versioning, allowing users to manage data snapshots without additional infrastructure. It supports vector similarity search, full-text search, and SQL queries, offering flexibility in data retrieval. The engine also boasts GPU acceleration for vector index building, enhancing performance for large datasets.

Quick Start & Requirements

Install: pip install lancedb (Python), npm install @lancedb/lancedb (JavaScript/TypeScript).
Prerequisites: Python 3.x, Node.js (for JS/TS). GPU support is mentioned for index building but not strictly required for basic operation.
Resources: Local storage for the database files.
Docs: https://lancedb.com/

Highlighted Details

Embedded, serverless vector database.
Supports vector search, full-text search, and SQL filtering.
Native Python and JavaScript/TypeScript APIs.
Zero-copy data versioning.
Integrates with LangChain, LlamaIndex, Pandas, Polars, DuckDB.
Core written in Rust using the Lance columnar format.

Maintenance & Community

Active development with community engagement via Discord.
Links to blog, Twitter, and a "Guru" Q&A platform are provided.

Licensing & Compatibility

Apache 2.0 License.
Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The README mentions GPU support for building vector indexes, implying that index building without a GPU might be significantly slower or impractical for large datasets. Specific performance benchmarks for non-GPU scenarios are not detailed.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

272 stars in the last 30 days