NeumAI  by NeumTry

Data platform for retrieval-augmented generation (RAG)

created 1 year ago
858 stars

Top 42.6% on sourcepulse

GitHubView on GitHub
Project Summary

Neum AI is a data platform designed to streamline the creation and synchronization of vector embeddings for Retrieval Augmented Generation (RAG) at scale. It targets developers building LLM applications, aiming to reduce integration complexity for data connectors, embedding models, and vector databases, thereby accelerating RAG implementation.

How It Works

Neum AI employs a high-throughput, distributed architecture to process vast datasets into vector embeddings. Its core functionality revolves around configurable "pipelines" that ingest data from various sources, process it using specified loaders and chunkers, vectorize it with chosen embedding models, and store the results in vector databases. This modular pipeline approach, combined with built-in connectors and real-time synchronization, facilitates efficient and up-to-date RAG data management.

Quick Start & Requirements

  • Install via pip: pip install neumai
  • Requires API keys for embedding services (e.g., OpenAI) and vector databases (e.g., Weaviate).
  • See Quickstart for detailed examples.

Highlighted Details

  • Supports billions of data points with a distributed architecture.
  • Offers built-in connectors for common data sources (Postgres, Websites, S3, Azure Blob, SharePoint, SingleStore, Supabase Storage), embedding services (OpenAI, Azure OpenAI), and vector stores (Supabase, Weaviate, Qdrant, Pinecone, SingleStore).
  • Features real-time data synchronization and customizable data pre-processing (loading, chunking, selection).
  • Manages metadata for hybrid retrieval and provides a local development environment.

Maintenance & Community

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is in active development, with several features marked as planned or experimental on the roadmap, including additional connectors (MySQL, GitHub, Google Drive) and advanced search capabilities. The license is not clearly stated, which may impact commercial adoption.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Yury Selivanov Yury Selivanov(Cofounder of Gel; Core Contributor to CPython, asyncio), and
2 more.

helix-db by HelixDB

1.5%
2k
Graph-vector database for RAG and AI applications
created 8 months ago
updated 21 hours ago
Feedback? Help us improve.