FedML  by FedML-AI

ML library for distributed training, model serving, and federated learning

Created 5 years ago
3,961 stars

Top 12.3% on SourcePulse

GitHubView on GitHub
Project Summary

FedML is a unified, scalable machine learning library designed for distributed training, model serving, and federated learning across diverse hardware environments. It targets developers and researchers needing to run AI jobs efficiently on any GPU cloud or on-premise cluster, with TensorOpera AI offering a complementary platform for generative AI and LLMs.

How It Works

FedML provides a unified MLOps layer with Studio for accessing and fine-tuning foundational models, and a Job Store for pre-built AI tasks. Its scheduler, TensorOpera Launch, optimizes GPU resource allocation and automates job execution across various compute topologies. The compute layer includes platforms for scalable model serving (Deploy), large-scale distributed training (Train), and federated learning (Federate), leveraging FedML's core library for cross-device and cross-cloud operations.

Quick Start & Requirements

  • Installation: pip install fedml
  • Prerequisites: Python 3.7+, PyTorch or TensorFlow. GPU and CUDA recommended for performance.
  • Documentation: https://docs.TensorOpera.ai

Highlighted Details

  • Unified library for distributed training, model serving, and federated learning.
  • TensorOpera Launch acts as a cross-cloud scheduler for efficient GPU resource utilization.
  • Supports on-device training on smartphones and cross-cloud GPU servers via federated learning.
  • Offers pre-built jobs and foundational models for generative AI and LLMs.

Maintenance & Community

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is heavily integrated with the TensorOpera AI platform, suggesting potential vendor lock-in or a focus on their ecosystem for advanced features. The README mentions "world’s first FLOps" which may indicate early-stage or experimental features within the federated learning component.

Health Check
Last Commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.