cube-studio  by tencentmusic

AI platform for cloud-native ML/DL, supporting the full model lifecycle

created 4 years ago
4,476 stars

Top 11.1% on sourcepulse

GitHubView on GitHub
Project Summary

Cube Studio is an open-source, cloud-native, unified MLOps platform designed for machine learning, deep learning, and large model AI. It caters to AI engineers and researchers by providing a comprehensive workflow from data management and development to distributed training, model deployment, and inference, supporting a wide range of hardware and frameworks.

How It Works

Cube Studio leverages a Kubernetes-native architecture, enabling multi-cluster management and resource isolation for different project groups. It offers a drag-and-drop pipeline orchestration for end-to-end ML workflows, supports various distributed training frameworks (PyTorch, TensorFlow, DeepSpeed, Horovod, etc.), and integrates features like automated hyperparameter tuning, model versioning, and zero-code inference service deployment. Its extensibility is highlighted by custom operator development and a model marketplace (AIHub).

Quick Start & Requirements

  • Installation: Typically deployed via Kubernetes (e.g., Helm charts).
  • Prerequisites: Kubernetes cluster, Docker, potentially distributed storage (NFS, Ceph, S3, etc.), and specific hardware like GPUs (NVIDIA T4/V100/A100, AMD DCU, NPUs) and RDMA-enabled network interfaces for advanced distributed training.
  • Resources: Requires significant compute and storage resources, especially for large-scale distributed training and inference.
  • Documentation: https://github.com/tencentmusic/cube-studio/wiki

Highlighted Details

  • Supports a broad spectrum of hardware, including domestic Chinese CPUs/GPUs/NPUs (e.g., Ascend, DCU, MLU) and RDMA.
  • Offers extensive support for large model fine-tuning (SFT, PPO) and distributed inference (vLLM, Ollama) with features like private knowledge bases and OpenAI-compatible APIs.
  • Includes a data labeling platform with automated labeling capabilities and integrates with various data processing engines (Spark, Hive, Presto).
  • Provides a model application market (AIHub) with 400+ pre-trained models and one-click deployment/fine-tuning options.

Maintenance & Community

  • Actively developed by Tencent Music.
  • Community resources are available via the GitHub wiki.

Licensing & Compatibility

  • The project appears to be primarily licensed under Apache 2.0, but specific components or integrations might have different licenses. Compatibility for commercial use is generally good under Apache 2.0.

Limitations & Caveats

  • The platform is extensive and complex, requiring a strong understanding of Kubernetes and distributed systems for effective deployment and management.
  • Some advanced features like automated labeling may require separate AIHub purchases.
Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
287 stars in the last 90 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

seldon-core by SeldonIO

0.1%
5k
MLOps framework for production model deployment on Kubernetes
created 7 years ago
updated 1 day ago
Feedback? Help us improve.