Discover and explore top open-source AI tools and projects—updated daily.
OpenDCAIData preparation and LLM training system
Top 28.4% on SourcePulse
DataFlow is a data-centric AI system designed for comprehensive data preparation and LLM training. It targets researchers and developers working with large language models, offering a modular framework to improve LLM performance in specific domains through targeted data processing and pipeline assembly.
How It Works
DataFlow employs a modular operator design, allowing users to build flexible data processing pipelines by combining various operators. These operators, categorized into Generic, Domain-Specific, and Evaluation types, handle tasks from text processing to domain-specific data manipulation. An intelligent DataFlow-agent can dynamically assemble new pipelines by recombining existing operators, enabling automated data workflow orchestration.
Quick Start & Requirements
pip install open-dataflowpip install open-dataflow[vllm]dataflow webui and dataflow webui agent.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is newly released (June 2025) and may still be undergoing rapid development. Specific performance benchmarks are detailed in the documentation, but real-world performance may vary.
1 day ago
Inactive
WecoAI
argilla-io
BoundaryML
katanaml
microsoft