cube-studio  by tencentmusic

AI platform for cloud-native ML/DL, supporting the full model lifecycle

Created 4 years ago
4,576 stars

Top 10.8% on SourcePulse

GitHubView on GitHub
Project Summary

Cube Studio is an open-source, cloud-native, unified MLOps platform designed for machine learning, deep learning, and large model AI. It caters to AI engineers and researchers by providing a comprehensive workflow from data management and development to distributed training, model deployment, and inference, supporting a wide range of hardware and frameworks.

How It Works

Cube Studio leverages a Kubernetes-native architecture, enabling multi-cluster management and resource isolation for different project groups. It offers a drag-and-drop pipeline orchestration for end-to-end ML workflows, supports various distributed training frameworks (PyTorch, TensorFlow, DeepSpeed, Horovod, etc.), and integrates features like automated hyperparameter tuning, model versioning, and zero-code inference service deployment. Its extensibility is highlighted by custom operator development and a model marketplace (AIHub).

Quick Start & Requirements

  • Installation: Typically deployed via Kubernetes (e.g., Helm charts).
  • Prerequisites: Kubernetes cluster, Docker, potentially distributed storage (NFS, Ceph, S3, etc.), and specific hardware like GPUs (NVIDIA T4/V100/A100, AMD DCU, NPUs) and RDMA-enabled network interfaces for advanced distributed training.
  • Resources: Requires significant compute and storage resources, especially for large-scale distributed training and inference.
  • Documentation: https://github.com/tencentmusic/cube-studio/wiki

Highlighted Details

  • Supports a broad spectrum of hardware, including domestic Chinese CPUs/GPUs/NPUs (e.g., Ascend, DCU, MLU) and RDMA.
  • Offers extensive support for large model fine-tuning (SFT, PPO) and distributed inference (vLLM, Ollama) with features like private knowledge bases and OpenAI-compatible APIs.
  • Includes a data labeling platform with automated labeling capabilities and integrates with various data processing engines (Spark, Hive, Presto).
  • Provides a model application market (AIHub) with 400+ pre-trained models and one-click deployment/fine-tuning options.

Maintenance & Community

  • Actively developed by Tencent Music.
  • Community resources are available via the GitHub wiki.

Licensing & Compatibility

  • The project appears to be primarily licensed under Apache 2.0, but specific components or integrations might have different licenses. Compatibility for commercial use is generally good under Apache 2.0.

Limitations & Caveats

  • The platform is extensive and complex, requiring a strong understanding of Kubernetes and distributed systems for effective deployment and management.
  • Some advanced features like automated labeling may require separate AIHub purchases.
Health Check
Last Commit

3 days ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
63 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Travis Fischer Travis Fischer(Founder of Agentic), and
2 more.

modelscope by modelscope

0.2%
8k
Model-as-a-Service library for model inference, training, and evaluation
Created 3 years ago
Updated 1 day ago
Feedback? Help us improve.