backend.ai  by lablup

Container-based computing cluster platform

Created 9 years ago
606 stars

Top 54.1% on SourcePulse

GitHubView on GitHub
Project Summary

Backend.AI is a container-based computing cluster platform designed for multi-tenant, on-demand computation sessions. It supports a wide range of programming languages and ML frameworks, with pluggable heterogeneous accelerator support including GPUs (CUDA, ROCm), TPUs, and NPUs. The platform is ideal for research institutions, data science teams, and organizations needing scalable, isolated compute environments.

How It Works

Backend.AI utilizes a distributed architecture comprising a central Manager for routing and scaling, and Agents running on compute nodes to manage containers. It employs Sokovan as its orchestrator. Sessions are exposed via REST and GraphQL APIs, with direct WebSocket tunneling for in-container applications like Jupyter, VSCode, and SSH. Its storage abstraction layer (vfolders) provides unified access to network storage, with customizable access controls.

Quick Start & Requirements

Highlighted Details

  • Pluggable heterogeneous accelerator support (CUDA, ROCm, TPU, IPU, NPUs).
  • Integrated support for Jupyter, VSCode Server, and SSH within compute sessions.
  • vfolders for unified, permission-controlled network storage access.
  • REST and GraphQL API endpoints for programmatic control.
  • SCIE-based installer for self-contained executables.

Maintenance & Community

  • Active development with clear versioning and migration guides.
  • Client SDKs available for Python, Java, and JavaScript.
  • Links to legacy per-package repositories are provided for historical context.

Licensing & Compatibility

  • Server-side components: LGPLv3.
  • Shared libraries and client SDKs: MIT License.
  • Commercial consulting and licensing options are available via contact@lablup.com.

Limitations & Caveats

The README mentions an "enterprise edition" with additional features, implying some functionality may be proprietary. While a single-node development script is provided, multi-node production setup details are deferred to external documentation.

Health Check
Last Commit

9 hours ago

Responsiveness

Inactive

Pull Requests (30d)
286
Issues (30d)
466
Star History
14 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.