cosdata  by cosdata

Advanced AI data platform for next-gen search pipelines

Created 1 year ago
347 stars

Top 80.0% on SourcePulse

GitHubView on GitHub
Project Summary

Cosdata is a next-generation retrieval infrastructure designed for AI-native applications, addressing the need for relevance beyond simple vector similarity. It targets developers building advanced search pipelines, offering a relevance-first architecture that combines multiple search modalities to enhance AI project data management and retrieval quality while reducing compute requirements.

How It Works

Cosdata employs a relevance-first architecture that unifies multiple search modalities: BM25 full-text search, HNSW dense vectors, and SPLADE learned sparse embeddings. This multi-modal approach, combined with context-aware capabilities like geofencing and hierarchical organization, aims to optimize for actual user satisfaction rather than just mathematical proximity. Its enterprise-grade design features colocated storage, streaming ingestion, and transactional versioning for robust AI workloads.

Quick Start & Requirements

  • Installation:
    • Linux: curl -sL https://cosdata.io/install.sh | bash
    • macOS & Windows: Docker (v20.10+) - docker pull cosdataio/cosdata:latest then docker run ...
  • Build from Source: Requires Git (v2.0+), Rust (v1.81.0+), and a C++ compiler (GCC ≥ 4.8 or Clang ≥ 3.4).
  • Testing: Requires Python 3.8+ and the uv CLI.
  • HTTPS: Detailed instructions are provided for generating self-signed certificates and configuring TLS.
  • Documentation: Client SDK documentation links are provided.

Highlighted Details

  • Performance: Claims 60-120% reduction in compute requirements and 20-50% improvement in retrieval quality (NDCG@10). Offers up to 12x faster indexing than Elasticsearch and sub-100ms response times.
  • Multi-Modal Search: Seamlessly integrates BM25, HNSW dense vectors, and SPLADE sparse embeddings.
  • Enterprise-Grade: Features colocated storage, transactional versioning ("time travel"), streaming ingestion, and production-ready security (end-to-end encryption, RBAC).
  • Scalability: Engineered for unbounded scalability with predictable, near-linear query performance.
  • Optimization: Supports advanced vector quantization (scalar, product) and auto-configuration of hyperparameters.

Maintenance & Community

Contributions are welcomed via issues and pull requests, with guidelines in CONTRIBUTING.md. Community engagement is encouraged through Discord, email (contact@cosdata.io), and GitHub issues/discussions.

Licensing & Compatibility

The provided README does not explicitly state the software's license. This omission requires further investigation for compatibility, especially for commercial use or closed-source integration.

Limitations & Caveats

Documentation for testing with real-world datasets is incomplete, with placeholders for download links and configuration instructions. The default HTTP mode is insecure and not recommended for production environments.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.