nanodl by HenryNdubuaku

Jax library for building transformer models, including GPT, Gemma, LlaMa, Mixtral, Whisper, SWin, ViT

Created 2 years ago

297 stars

Top 89.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Roy Frostig

Coauthor of JAX; Research Scientist at Google DeepMind

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

NanoDL is a Jax-based library for building and training transformer models from scratch, targeting AI/ML experts who need to develop smaller-scale, efficient models. It provides a pedagogical approach with modular code, enabling customization and accelerated development of neural networks with distributed training capabilities.

How It Works

NanoDL leverages Jax and Flax for efficient computation and distributed training. Its core design emphasizes modularity, with each model and its components contained in single files to minimize dependencies and facilitate learning. This approach allows users to easily select, combine, and modify layers and blocks, including specialized ones like RoPE, GQA, and MQA, for flexible model development.

Quick Start & Requirements

Install via pip: pip install nanodl
Prerequisites: Python 3.9+, JAX, Flax, Optax (GPU support recommended for training).
Training requires 1 to N GPUs/TPUs; CPU-only JAX supports model creation.
Official documentation and examples are available via Discord and clickable badges in the README.

Highlighted Details

Implements a wide array of transformer blocks and layers not found in Flax/Jax.
Includes implementations for models like Gemma, LlaMa3, Mistral, GPT3/4, T5, Whisper, ViT, and CLIP.
Offers data-parallel trainers for multi-GPU/TPU training and simplified data handling with custom dataloaders.
Accelerates classical ML models (PCA, KMeans, etc.) on GPUs/TPUs.

Maintenance & Community

The project encourages contributions and feedback via Discord, issues, and pull requests.
Experimental features like MAMBA, KAN, BitNet, GAT, and RLHF are available for direct copying from the repository.
The long-term goal is to build "nano" versions of popular models (<1B parameters) with competitive performance.

Licensing & Compatibility

The library is available under an unspecified license, but the README implies open contribution and use. Further clarification on licensing is recommended for commercial applications.

Limitations & Caveats

The project is explicitly stated to be in development ("still in dev, works great but roughness is expected"), with experimental features not yet packaged. Contributions are highly encouraged to address this.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days