higgsfield  by higgsfield-ai

ML framework for large model training and GPU orchestration

created 7 years ago
3,443 stars

Top 14.4% on sourcepulse

GitHubView on GitHub
Project Summary

Higgsfield is an open-source framework designed for orchestrating GPU workloads and training massive machine learning models, particularly LLMs with trillions of parameters. It targets researchers and engineers dealing with distributed training complexities, offering fault tolerance, scalability, and simplified environment management.

How It Works

Higgsfield acts as a GPU workload manager, allocating compute resources and supporting advanced sharding techniques like DeepSpeed ZeRO-3 and PyTorch's Fully Sharded Data Parallel. This approach enables efficient training of trillion-parameter models by distributing model states, gradients, and optimizer states across multiple GPUs and nodes. It integrates with CI/CD pipelines (GitHub Actions) to automate deployment and execution of training experiments.

Quick Start & Requirements

  • Install: pip install higgsfield==0.0.3
  • Requirements: Ubuntu nodes with SSH access, non-root user with passwordless sudo. Tested on Azure, LambdaLabs, FluidStack.
  • Setup: Requires node setup, environment configuration, and Git integration.
  • Links: Quick Start Guide, Tutorial

Highlighted Details

  • Supports ZeRO-3 and PyTorch Fully Sharded Data Parallel for trillion-parameter models.
  • Automates deployment and execution via GitHub Actions integration.
  • Simplifies environment management, eliminating dependency version conflicts.
  • Provides a streamlined interface for defining and managing experiments, reducing configuration complexity.

Maintenance & Community

  • Active community support via GitHub Issues and Twitter.
  • Website for discussions and news.

Licensing & Compatibility

  • License: Not explicitly stated in the README. Compatibility for commercial use or closed-source linking is therefore unclear.

Limitations & Caveats

The project is at version 0.0.3, indicating it is likely in an early development stage. The license is not specified, which may pose a barrier for commercial adoption or integration into closed-source projects.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
88 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ben Firshman Ben Firshman(Cofounder of Replicate), and
6 more.

Made-With-ML by GokuMohandas

0.4%
41k
ML course for production-grade applications
created 6 years ago
updated 11 months ago
Feedback? Help us improve.