antfly  by antflydb

Distributed search engine for multimodal RAG and knowledge graphs

Created 3 weeks ago

New!

329 stars

Top 83.2% on SourcePulse

GitHubView on GitHub
Project Summary

Antfly is a distributed search engine designed for multimodal data, integrating full-text, vector, and graph search capabilities. It targets developers and power users needing a unified platform for complex data retrieval and AI-powered applications, offering automated data enrichment and built-in RAG agents for enhanced generation.

How It Works

Antfly employs a multi-raft architecture with distinct consensus groups for metadata and storage shards, ensuring high availability and fault tolerance. It natively combines BM25 for full-text search, dense and sparse vectors (e.g., SPLADE) for semantic similarity, and graph traversal over text, images, audio, and video. Data ingestion triggers automatic embedding generation, chunking, and graph edge extraction, feeding into a unified query engine. Built-in RAG agents leverage these capabilities for retrieval-augmented generation, supporting streaming, multi-turn chat, and tool calling.

Quick Start & Requirements

Start a single-node cluster with built-in ML inference using go run ./cmd/antfly swarm or via Docker: docker run -p 8080:8080 ghcr.io/antflydb/antfly:omni. This provides access to the Antfarm dashboard at http://localhost:8080 with playgrounds for various features. Building from source requires Go.

Highlighted Details

  • Hybrid Search: Unified querying across BM25, dense vectors, and sparse vectors (SPLADE).
  • RAG Agents: Integrated retrieval-augmented generation with streaming, multi-turn chat, and tool calling.
  • Multimodal Support: Indexing and searching images, audio, and video using models like CLIP and CLAP.
  • Graph Capabilities: Automatic relationship extraction and graph traversal queries.
  • Extensibility: Bring-your-own-model support (Ollama, OpenAI, Bedrock, etc.) and a PostgreSQL extension (pgaf) for seamless integration.
  • Performance: SIMD/SME acceleration for vector operations on x86 and ARM.

Maintenance & Community

The project encourages community involvement via a Discord server for support and discussion. Contribution guidelines are available in CONTRIBUTING.md.

Licensing & Compatibility

The core Antfly server is licensed under the Elastic License 2.0 (ELv2), which permits use, modification, and self-hosting but prohibits offering Antfly as a managed service. All other components, including SDKs, React components, and ML tools, are licensed under the permissive Apache 2.0 license.

Limitations & Caveats

The ELv2 license restricts commercial offerings of Antfly as a managed service. While the architecture is designed for distributed resilience, specific performance characteristics and scalability limits would require further benchmarking.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
40
Issues (30d)
2
Star History
329 stars in the last 27 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

RAG-Anything by HKUDS

2.8%
16k
All-in-one multimodal RAG system
Created 10 months ago
Updated 4 days ago
Feedback? Help us improve.