pytorch-lightning by Lightning-AI

Deep learning framework for pretraining, finetuning, and deploying AI models

Created 6 years ago

30,703 stars

Top 1.2% on SourcePulse

View on GitHub

36 Experts Love This Project

Albert Gu

Cofounder of Cartesia; Professor at CMU

Luca Soldaini

Research Scientist at Ai2

Elvis Saravia

Founder of DAIR.AI

Ce Zhang

Cofounder of Together AI; Professor at UChicago

and 32 more!

Project Summary

PyTorch Lightning is a framework designed to streamline the training, finetuning, and deployment of AI models, particularly for large-scale or multi-device setups. It targets AI researchers and engineers by abstracting away boilerplate code, allowing them to focus on model architecture and scientific experimentation while maintaining flexibility and control.

How It Works

PyTorch Lightning organizes PyTorch code by separating the "science" (model definition, loss calculation) from the "engineering" (training loops, hardware acceleration, distributed training). It achieves this through LightningModule and Trainer classes. LightningModule encapsulates the model, optimizer configuration, and training/validation steps, while Trainer handles the execution logic, including device placement, mixed precision, and scaling strategies. This approach simplifies complex training setups and promotes code readability and reproducibility.

Quick Start & Requirements

Install: pip install lightning or conda install lightning -c conda-forge
Requirements: Python 3.7+, PyTorch. Optional dependencies for advanced features can be installed with pip install lightning['extra'].
Docs: https://lightning.ai/docs/pytorch/stable/
Examples: https://lightning.ai/docs/pytorch/stable/examples/

Highlighted Details

Supports seamless scaling across multiple GPUs, TPUs, and nodes with zero code changes.
Offers advanced features like mixed-precision training, experiment tracking (TensorBoard, W&B, etc.), checkpointing, and early stopping.
Includes Lightning Fabric for expert control over training loops and scaling strategies, suitable for complex models like LLMs and diffusion models.
Provides utilities for exporting models to TorchScript and ONNX for production deployment.

Maintenance & Community

Maintained by a core team of 10+ contributors and over 800 community contributors. Active Discord community for support and discussions.

Licensing & Compatibility

Licensed under Apache 2.0, which is permissive for commercial use and closed-source linking.

Limitations & Caveats

While designed for flexibility, the abstraction layer might introduce a slight overhead (around 300ms per epoch compared to pure PyTorch). The extensive feature set can also lead to a steeper learning curve for users unfamiliar with distributed training concepts.

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

169 stars in the last 30 days