Daft  by Eventual-Inc

High-performance data engine for AI and multimodal workloads

Created 3 years ago
4,869 stars

Top 10.2% on SourcePulse

GitHubView on GitHub
Project Summary

This project addresses the need for a high-performance data engine capable of handling diverse AI and multimodal workloads at scale. It targets AI engineers and researchers seeking to process images, audio, video, and structured data efficiently within a unified framework, offering built-in AI operations and seamless scalability.

How It Works

Daft is a Python-native data engine, leveraging Rust for high-performance execution. Its core design emphasizes native multimodal data processing, allowing images, audio, video, and embeddings to be handled alongside structured data. It integrates built-in AI operations, such as LLM prompting and embedding generation, and supports distributed execution across Ray, Kubernetes, or Daft Cloud.

Quick Start & Requirements

Highlighted Details

  • Native multimodal processing for images, audio, video, embeddings, and structured data in a single framework.
  • Built-in AI operations (LLM prompts, embeddings, classification) supporting OpenAI, Transformers, or custom models.
  • Python-native interface with a Rust backend for optimized performance.
  • Seamless scaling capabilities for distributed clusters (Ray, Kubernetes, Daft Cloud).
  • Universal data connectivity, supporting sources like S3, GCS, Iceberg, Delta Lake, Hugging Face, and Unity Catalog.

Maintenance & Community

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: The Apache 2.0 license permits commercial use and integration into closed-source projects.

Limitations & Caveats

The project actively collects telemetry data for improvement, which can be opted out of. The quickstart documentation link currently points to /latest/, suggesting potential instability or ongoing development in that specific area.

Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
159
Issues (30d)
150
Star History
230 stars in the last 30 days

Explore Similar Projects

Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
26 more.

datasets by huggingface

0.1%
21k
Access and process large AI datasets efficiently
Created 5 years ago
Updated 3 days ago
Feedback? Help us improve.