chroma  by chroma-core

Open-source embedding database for building LLM apps with memory

created 2 years ago
21,399 stars

Top 2.1% on sourcepulse

GitHubView on GitHub
Project Summary

Chroma is an open-source embedding database designed for developers building AI-native applications, particularly those leveraging Large Language Models (LLMs). It simplifies the process of adding memory and context to applications by efficiently storing, indexing, and querying text embeddings, enabling features like "chat with your data."

How It Works

Chroma acts as a vector database, storing numerical representations (embeddings) of text or other data. It handles the embedding process automatically using Sentence Transformers by default, but also supports custom embedding functions (e.g., OpenAI, Cohere). Users interact with Chroma via a simple API to create collections, add documents with associated metadata, and query for semantically similar content using natural language. Its core advantage lies in its ease of use and seamless integration into Python and JavaScript LLM workflows.

Quick Start & Requirements

Highlighted Details

  • Fully typed, tested, and documented API.
  • Integrates with LangChain (Python/JS) and LlamaIndex.
  • Supports filtering by metadata and document content.
  • Offers options for in-memory or persistent storage.
  • Can use default Sentence Transformers, OpenAI, or custom embedding functions.

Maintenance & Community

  • Active development with weekly releases (Mondays for tagged versions, hotfixes as needed).
  • Community engagement via Discord (#contributing channel).
  • Public roadmap available for contribution ideas.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The README mentions a "Row-based API coming soon," indicating it's not yet available. While it supports client-server mode, detailed scaling or distributed deployment configurations are not elaborated upon in the provided text.

Health Check
Last commit

19 hours ago

Responsiveness

1 day

Pull Requests (30d)
150
Issues (30d)
22
Star History
1,921 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.0%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 14 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anton Troynikov Anton Troynikov(Cofounder of Chroma), and
20 more.

llama_index by run-llama

0.3%
43k
Data framework for building LLM-powered agents
created 2 years ago
updated 15 hours ago
Feedback? Help us improve.