cube-studio  by data-infra

Unified cloud-native AI platform for end-to-end ML workflows

Created 1 year ago
1,795 stars

Top 24.0% on SourcePulse

GitHubView on GitHub
Project Summary

Cube Studio is an open-source, cloud-native, one-stop MLOps platform designed for the entire lifecycle of machine learning, deep learning, and large model AI projects. It caters to data scientists, ML engineers, and researchers by providing an integrated environment for data management, model development, training, deployment, and monitoring, aiming to streamline AI workflows and accelerate model production.

How It Works

Cube Studio is built on a Kubernetes-native architecture, offering a comprehensive suite of modules including data management (ETL, labeling, data maps), development environments (JupyterLab, VSCode, MATLAB, RStudio), a drag-and-drop pipeline orchestrator, and model serving capabilities. It supports a wide array of distributed training frameworks (PyTorch, TensorFlow, Horovod, DeepSpeed, Paddle, ColossalAI) and hardware accelerators (NVIDIA GPUs, Ascend NPUs, DCUs, VGPUs), with a focus on multi-node, multi-GPU training and inference for large models.

Quick Start & Requirements

  • Installation: Typically deployed via Kubernetes (Helm charts or manifests).
  • Prerequisites: Kubernetes cluster, Docker, kubectl. External dependencies include databases (MySQL/PostgreSQL) for metadata.
  • Resources: Requires significant compute and storage resources, especially for distributed training and large model deployments.
  • Documentation: https://github.com/data-infra/cube-studio/wiki

Highlighted Details

  • Extensive support for diverse hardware, including domestic CPUs/GPUs/NPUs (Ascend, DCU, MLU) and RDMA.
  • Comprehensive large model support, including fine-tuning (SFT, PPO), distributed inference (vLLM, Ollama), and private knowledge base integration.
  • Features a drag-and-drop pipeline editor, online IDEs (Jupyter, VSCode), automated labeling, and a model application market.
  • Supports multi-cluster management, edge computing deployments, and serverless cluster modes.

Maintenance & Community

The project is actively maintained and welcomes community contributions. Further details on community engagement and roadmaps can be found via links provided in the repository.

Licensing & Compatibility

The project is released under an open-source license, with specific details available in the repository. It is designed for integration into various enterprise environments.

Limitations & Caveats

The platform is extensive and complex, requiring a robust Kubernetes infrastructure and significant expertise for setup and management. Some advanced features like automated labeling may require additional services or purchases.

Health Check
Last Commit

3 days ago

Responsiveness

1 week

Pull Requests (30d)
2
Issues (30d)
3
Star History
26 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.3%
4k
AI inference pipeline framework
Created 1 year ago
Updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
20 more.

alpa by alpa-projects

0.0%
3k
Auto-parallelization framework for large-scale neural network training and serving
Created 4 years ago
Updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
26 more.

ColossalAI by hpcaitech

0.1%
41k
AI system for large-scale parallel training
Created 3 years ago
Updated 13 hours ago
Feedback? Help us improve.