cube-studio  by data-infra

Unified cloud-native AI platform for end-to-end ML workflows

created 1 year ago
1,752 stars

Top 25.0% on sourcepulse

GitHubView on GitHub
Project Summary

Cube Studio is an open-source, cloud-native, one-stop MLOps platform designed for the entire lifecycle of machine learning, deep learning, and large model AI projects. It caters to data scientists, ML engineers, and researchers by providing an integrated environment for data management, model development, training, deployment, and monitoring, aiming to streamline AI workflows and accelerate model production.

How It Works

Cube Studio is built on a Kubernetes-native architecture, offering a comprehensive suite of modules including data management (ETL, labeling, data maps), development environments (JupyterLab, VSCode, MATLAB, RStudio), a drag-and-drop pipeline orchestrator, and model serving capabilities. It supports a wide array of distributed training frameworks (PyTorch, TensorFlow, Horovod, DeepSpeed, Paddle, ColossalAI) and hardware accelerators (NVIDIA GPUs, Ascend NPUs, DCUs, VGPUs), with a focus on multi-node, multi-GPU training and inference for large models.

Quick Start & Requirements

  • Installation: Typically deployed via Kubernetes (Helm charts or manifests).
  • Prerequisites: Kubernetes cluster, Docker, kubectl. External dependencies include databases (MySQL/PostgreSQL) for metadata.
  • Resources: Requires significant compute and storage resources, especially for distributed training and large model deployments.
  • Documentation: https://github.com/data-infra/cube-studio/wiki

Highlighted Details

  • Extensive support for diverse hardware, including domestic CPUs/GPUs/NPUs (Ascend, DCU, MLU) and RDMA.
  • Comprehensive large model support, including fine-tuning (SFT, PPO), distributed inference (vLLM, Ollama), and private knowledge base integration.
  • Features a drag-and-drop pipeline editor, online IDEs (Jupyter, VSCode), automated labeling, and a model application market.
  • Supports multi-cluster management, edge computing deployments, and serverless cluster modes.

Maintenance & Community

The project is actively maintained and welcomes community contributions. Further details on community engagement and roadmaps can be found via links provided in the repository.

Licensing & Compatibility

The project is released under an open-source license, with specific details available in the repository. It is designed for integration into various enterprise environments.

Limitations & Caveats

The platform is extensive and complex, requiring a robust Kubernetes infrastructure and significant expertise for setup and management. Some advanced features like automated labeling may require additional services or purchases.

Health Check
Last commit

2 days ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
99 stars in the last 90 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

seldon-core by SeldonIO

0.1%
5k
MLOps framework for production model deployment on Kubernetes
created 7 years ago
updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ben Firshman Ben Firshman(Cofounder of Replicate), and
6 more.

Made-With-ML by GokuMohandas

0.4%
41k
ML course for production-grade applications
created 6 years ago
updated 11 months ago
Feedback? Help us improve.