higgsfield by higgsfield-ai

ML framework for large model training and GPU orchestration

Created 7 years ago

3,522 stars

Top 13.7% on SourcePulse

View on GitHub

15 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Luis Capelo

Cofounder of Lightning AI

Pawel Garbacki

Cofounder of Fireworks AI

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

and 11 more!

Project Summary

Higgsfield is an open-source framework designed for orchestrating GPU workloads and training massive machine learning models, particularly LLMs with trillions of parameters. It targets researchers and engineers dealing with distributed training complexities, offering fault tolerance, scalability, and simplified environment management.

How It Works

Higgsfield acts as a GPU workload manager, allocating compute resources and supporting advanced sharding techniques like DeepSpeed ZeRO-3 and PyTorch's Fully Sharded Data Parallel. This approach enables efficient training of trillion-parameter models by distributing model states, gradients, and optimizer states across multiple GPUs and nodes. It integrates with CI/CD pipelines (GitHub Actions) to automate deployment and execution of training experiments.

Quick Start & Requirements

Install: pip install higgsfield==0.0.3
Requirements: Ubuntu nodes with SSH access, non-root user with passwordless sudo. Tested on Azure, LambdaLabs, FluidStack.
Setup: Requires node setup, environment configuration, and Git integration.
Links: Quick Start Guide, Tutorial

Highlighted Details

Supports ZeRO-3 and PyTorch Fully Sharded Data Parallel for trillion-parameter models.
Automates deployment and execution via GitHub Actions integration.
Simplifies environment management, eliminating dependency version conflicts.
Provides a streamlined interface for defining and managing experiments, reducing configuration complexity.

Maintenance & Community

Active community support via GitHub Issues and Twitter.
Website for discussions and news.

Licensing & Compatibility

License: Not explicitly stated in the README. Compatibility for commercial use or closed-source linking is therefore unclear.

Limitations & Caveats

The project is at version 0.0.3, indicating it is likely in an early development stage. The license is not specified, which may pose a barrier for commercial adoption or integration into closed-source projects.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

19 stars in the last 30 days