databend  by databendlabs

AI-Native Data Warehouse for Multimodal Analytics

Created 5 years ago
8,925 stars

Top 5.7% on SourcePulse

GitHubView on GitHub
Project Summary

Databend is an AI-native, open-source data warehouse designed as a Snowflake alternative, targeting enterprises and developers needing to unify and analyze structured, semi-structured, and unstructured data. It offers near 100% SQL compatibility and native AI capabilities, aiming for cost reduction and high performance at petabyte scale.

How It Works

Databend employs a Rust-powered Massively Parallel Processing (MPP) architecture with S3-native storage. This design enables true compute-storage separation for infinite scalability and cost efficiency by leveraging object storage. It supports a unified data model, including a VECTOR data type with HNSW indexing for multimodal AI workloads, alongside standard SQL and VARIANT types for JSON.

Quick Start & Requirements

  • Cloud: Databend Cloud (beta) offers a production-ready experience in 60 seconds.
  • Self-Hosted: Installation Guide available for deployment on AWS, Azure, GCP, or on-premise.
  • CLI: BendSQL CLI for interaction.
  • Dependencies: No specific hardware or software prerequisites are mentioned for basic setup beyond standard cloud/OS environments.

Highlighted Details

  • Claims 10x faster performance via vectorized execution and SIMD optimization.
  • Aims for 90% cost reduction through S3-native storage.
  • Supports multimodal analytics, unifying structured, JSON, and vector embeddings.
  • Production-proven at petabyte scale, handling 800+ PB and 100M+ queries daily.

Maintenance & Community

  • Active community with Slack and GitHub channels for discussion and support.
  • Roadmap available, indicating ongoing development.
  • Contributions are welcomed, with a clear process for merging code.

Licensing & Compatibility

  • Licensed under Apache License 2.0 and Elastic License 2.0.
  • The Apache 2.0 license generally permits commercial use and linking with closed-source applications.

Limitations & Caveats

  • Databend Cloud is currently in beta.
  • While aiming for Snowflake compatibility, specific edge cases or advanced features might require validation for migration.
Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
92
Issues (30d)
26
Star History
110 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Alexander Wettig Alexander Wettig(Coauthor of SWE-bench, SWE-agent), and
5 more.

data-juicer by modelscope

0.8%
5k
Data-Juicer: Data processing system for foundation models
Created 2 years ago
Updated 1 day ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
26 more.

datasets by huggingface

0.1%
21k
Access and process large AI datasets efficiently
Created 5 years ago
Updated 1 day ago
Feedback? Help us improve.