FedML  by FedML-AI

ML library for distributed training, model serving, and federated learning

created 5 years ago
3,912 stars

Top 12.7% on sourcepulse

GitHubView on GitHub
Project Summary

FedML is a unified, scalable machine learning library designed for distributed training, model serving, and federated learning across diverse hardware environments. It targets developers and researchers needing to run AI jobs efficiently on any GPU cloud or on-premise cluster, with TensorOpera AI offering a complementary platform for generative AI and LLMs.

How It Works

FedML provides a unified MLOps layer with Studio for accessing and fine-tuning foundational models, and a Job Store for pre-built AI tasks. Its scheduler, TensorOpera Launch, optimizes GPU resource allocation and automates job execution across various compute topologies. The compute layer includes platforms for scalable model serving (Deploy), large-scale distributed training (Train), and federated learning (Federate), leveraging FedML's core library for cross-device and cross-cloud operations.

Quick Start & Requirements

  • Installation: pip install fedml
  • Prerequisites: Python 3.7+, PyTorch or TensorFlow. GPU and CUDA recommended for performance.
  • Documentation: https://docs.TensorOpera.ai

Highlighted Details

  • Unified library for distributed training, model serving, and federated learning.
  • TensorOpera Launch acts as a cross-cloud scheduler for efficient GPU resource utilization.
  • Supports on-device training on smartphones and cross-cloud GPU servers via federated learning.
  • Offers pre-built jobs and foundational models for generative AI and LLMs.

Maintenance & Community

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is heavily integrated with the TensorOpera AI platform, suggesting potential vendor lock-in or a focus on their ecosystem for advanced features. The README mentions "world’s first FLOps" which may indicate early-stage or experimental features within the federated learning component.

Health Check
Last commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
68 stars in the last 90 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), and
3 more.

levanter by stanford-crfm

0.5%
628
Framework for training foundation models with JAX
created 3 years ago
updated 1 day ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.6%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 3 days ago
Starred by Peter Norvig Peter Norvig(Author of Artificial Intelligence: A Modern Approach; Research Director at Google), Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), and
45 more.

tensorflow by tensorflow

0.1%
191k
Open-source ML framework
created 9 years ago
updated 21 hours ago
Feedback? Help us improve.