NeumAI by NeumTry

Data platform for retrieval-augmented generation (RAG)

Created 2 years ago

866 stars

Top 41.2% on SourcePulse

View on GitHub

4 Experts Love This Project

Elie Bursztein

Cybersecurity Lead at Google DeepMind

Gabriel Almeida

Cofounder of Langflow

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

Neum AI is a data platform designed to streamline the creation and synchronization of vector embeddings for Retrieval Augmented Generation (RAG) at scale. It targets developers building LLM applications, aiming to reduce integration complexity for data connectors, embedding models, and vector databases, thereby accelerating RAG implementation.

How It Works

Neum AI employs a high-throughput, distributed architecture to process vast datasets into vector embeddings. Its core functionality revolves around configurable "pipelines" that ingest data from various sources, process it using specified loaders and chunkers, vectorize it with chosen embedding models, and store the results in vector databases. This modular pipeline approach, combined with built-in connectors and real-time synchronization, facilitates efficient and up-to-date RAG data management.

Quick Start & Requirements

Install via pip: pip install neumai
Requires API keys for embedding services (e.g., OpenAI) and vector databases (e.g., Weaviate).
See Quickstart for detailed examples.

Highlighted Details

Supports billions of data points with a distributed architecture.
Offers built-in connectors for common data sources (Postgres, Websites, S3, Azure Blob, SharePoint, SingleStore, Supabase Storage), embedding services (OpenAI, Azure OpenAI), and vector stores (Supabase, Weaviate, Qdrant, Pinecone, SingleStore).
Features real-time data synchronization and customizable data pre-processing (loading, chunking, selection).
Manages metadata for hybrid retrieval and provides a local development environment.

Maintenance & Community

Active development with a roadmap available.
Community support via Discord and Twitter.
Contact available via email (founders@tryneum.com) and scheduled calls.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is in active development, with several features marked as planned or experimental on the roadmap, including additional connectors (MySQL, GitHub, Google Drive) and advanced search capabilities. The license is not clearly stated, which may impact commercial adoption.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days