chroma by chroma-core

Open-source embedding database for building LLM apps with memory

Created 3 years ago

25,404 stars

Top 1.5% on SourcePulse

View on GitHub

38 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Paul Klein

Founder of Browserbase

Jason Huggins

Creator of Selenium

Paul Copplestone

Cofounder of Supabase

and 34 more!

Project Summary

Chroma is an open-source embedding database designed for developers building AI-native applications, particularly those leveraging Large Language Models (LLMs). It simplifies the process of adding memory and context to applications by efficiently storing, indexing, and querying text embeddings, enabling features like "chat with your data."

How It Works

Chroma acts as a vector database, storing numerical representations (embeddings) of text or other data. It handles the embedding process automatically using Sentence Transformers by default, but also supports custom embedding functions (e.g., OpenAI, Cohere). Users interact with Chroma via a simple API to create collections, add documents with associated metadata, and query for semantically similar content using natural language. Its core advantage lies in its ease of use and seamless integration into Python and JavaScript LLM workflows.

Quick Start & Requirements

Install Python client: pip install chromadb
Install JavaScript client: npm install chromadb
For client-server mode: chroma run --path /chroma_db_path
Official Docs: https://docs.trychroma.com/
Templates: https://replit.com/@chromadb/chroma

Highlighted Details

Fully typed, tested, and documented API.
Integrates with LangChain (Python/JS) and LlamaIndex.
Supports filtering by metadata and document content.
Offers options for in-memory or persistent storage.
Can use default Sentence Transformers, OpenAI, or custom embedding functions.

Maintenance & Community

Active development with weekly releases (Mondays for tagged versions, hotfixes as needed).
Community engagement via Discord (#contributing channel).
Public roadmap available for contribution ideas.

Licensing & Compatibility

Licensed under Apache 2.0.
Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The README mentions a "Row-based API coming soon," indicating it's not yet available. While it supports client-server mode, detailed scaling or distributed deployment configurations are not elaborated upon in the provided text.

Health Check

Last Commit

15 hours ago

Responsiveness

1 day

Pull Requests (30d)

123

Issues (30d)

Star History

580 stars in the last 30 days