pytorch-lightning  by Lightning-AI

Deep learning framework for pretraining, finetuning, and deploying AI models

Created 6 years ago
30,371 stars

Top 1.2% on SourcePulse

GitHubView on GitHub
Project Summary

PyTorch Lightning is a framework designed to streamline the training, finetuning, and deployment of AI models, particularly for large-scale or multi-device setups. It targets AI researchers and engineers by abstracting away boilerplate code, allowing them to focus on model architecture and scientific experimentation while maintaining flexibility and control.

How It Works

PyTorch Lightning organizes PyTorch code by separating the "science" (model definition, loss calculation) from the "engineering" (training loops, hardware acceleration, distributed training). It achieves this through LightningModule and Trainer classes. LightningModule encapsulates the model, optimizer configuration, and training/validation steps, while Trainer handles the execution logic, including device placement, mixed precision, and scaling strategies. This approach simplifies complex training setups and promotes code readability and reproducibility.

Quick Start & Requirements

Highlighted Details

  • Supports seamless scaling across multiple GPUs, TPUs, and nodes with zero code changes.
  • Offers advanced features like mixed-precision training, experiment tracking (TensorBoard, W&B, etc.), checkpointing, and early stopping.
  • Includes Lightning Fabric for expert control over training loops and scaling strategies, suitable for complex models like LLMs and diffusion models.
  • Provides utilities for exporting models to TorchScript and ONNX for production deployment.

Maintenance & Community

Maintained by a core team of 10+ contributors and over 800 community contributors. Active Discord community for support and discussions.

Licensing & Compatibility

Licensed under Apache 2.0, which is permissive for commercial use and closed-source linking.

Limitations & Caveats

While designed for flexibility, the abstraction layer might introduce a slight overhead (around 300ms per epoch compared to pure PyTorch). The extensive feature set can also lead to a steeper learning curve for users unfamiliar with distributed training concepts.

Health Check
Last Commit

4 hours ago

Responsiveness

1 day

Pull Requests (30d)
55
Issues (30d)
24
Star History
191 stars in the last 30 days

Explore Similar Projects

Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
20 more.

accelerate by huggingface

0.2%
9k
PyTorch training helper for distributed execution
Created 5 years ago
Updated 1 week ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
27 more.

ColossalAI by hpcaitech

0.0%
41k
AI system for large-scale parallel training
Created 4 years ago
Updated 3 weeks ago
Feedback? Help us improve.