NeumAI  by NeumTry

Data platform for retrieval-augmented generation (RAG)

Created 2 years ago
860 stars

Top 41.7% on SourcePulse

GitHubView on GitHub
Project Summary

Neum AI is a data platform designed to streamline the creation and synchronization of vector embeddings for Retrieval Augmented Generation (RAG) at scale. It targets developers building LLM applications, aiming to reduce integration complexity for data connectors, embedding models, and vector databases, thereby accelerating RAG implementation.

How It Works

Neum AI employs a high-throughput, distributed architecture to process vast datasets into vector embeddings. Its core functionality revolves around configurable "pipelines" that ingest data from various sources, process it using specified loaders and chunkers, vectorize it with chosen embedding models, and store the results in vector databases. This modular pipeline approach, combined with built-in connectors and real-time synchronization, facilitates efficient and up-to-date RAG data management.

Quick Start & Requirements

  • Install via pip: pip install neumai
  • Requires API keys for embedding services (e.g., OpenAI) and vector databases (e.g., Weaviate).
  • See Quickstart for detailed examples.

Highlighted Details

  • Supports billions of data points with a distributed architecture.
  • Offers built-in connectors for common data sources (Postgres, Websites, S3, Azure Blob, SharePoint, SingleStore, Supabase Storage), embedding services (OpenAI, Azure OpenAI), and vector stores (Supabase, Weaviate, Qdrant, Pinecone, SingleStore).
  • Features real-time data synchronization and customizable data pre-processing (loading, chunking, selection).
  • Manages metadata for hybrid retrieval and provides a local development environment.

Maintenance & Community

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is in active development, with several features marked as planned or experimental on the roadmap, including additional connectors (MySQL, GitHub, Google Drive) and advanced search capabilities. The license is not clearly stated, which may impact commercial adoption.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Dominik Moritz Dominik Moritz(Research Scientist at Apple; Professor at CMU) and Casey Caruso Casey Caruso(Managing Partner of Topology Ventures).

latent-scope by enjalot

0%
726
Scientific tool for latent space investigation
Created 2 years ago
Updated 4 months ago
Feedback? Help us improve.