nanodl  by HenryNdubuaku

Jax library for building transformer models, including GPT, Gemma, LlaMa, Mixtral, Whisper, SWin, ViT

Created 2 years ago
291 stars

Top 90.6% on SourcePulse

GitHubView on GitHub
Project Summary

NanoDL is a Jax-based library for building and training transformer models from scratch, targeting AI/ML experts who need to develop smaller-scale, efficient models. It provides a pedagogical approach with modular code, enabling customization and accelerated development of neural networks with distributed training capabilities.

How It Works

NanoDL leverages Jax and Flax for efficient computation and distributed training. Its core design emphasizes modularity, with each model and its components contained in single files to minimize dependencies and facilitate learning. This approach allows users to easily select, combine, and modify layers and blocks, including specialized ones like RoPE, GQA, and MQA, for flexible model development.

Quick Start & Requirements

  • Install via pip: pip install nanodl
  • Prerequisites: Python 3.9+, JAX, Flax, Optax (GPU support recommended for training).
  • Training requires 1 to N GPUs/TPUs; CPU-only JAX supports model creation.
  • Official documentation and examples are available via Discord and clickable badges in the README.

Highlighted Details

  • Implements a wide array of transformer blocks and layers not found in Flax/Jax.
  • Includes implementations for models like Gemma, LlaMa3, Mistral, GPT3/4, T5, Whisper, ViT, and CLIP.
  • Offers data-parallel trainers for multi-GPU/TPU training and simplified data handling with custom dataloaders.
  • Accelerates classical ML models (PCA, KMeans, etc.) on GPUs/TPUs.

Maintenance & Community

  • The project encourages contributions and feedback via Discord, issues, and pull requests.
  • Experimental features like MAMBA, KAN, BitNet, GAT, and RLHF are available for direct copying from the repository.
  • The long-term goal is to build "nano" versions of popular models (<1B parameters) with competitive performance.

Licensing & Compatibility

  • The library is available under an unspecified license, but the README implies open contribution and use. Further clarification on licensing is recommended for commercial applications.

Limitations & Caveats

  • The project is explicitly stated to be in development ("still in dev, works great but roughness is expected"), with experimental features not yet packaged. Contributions are highly encouraged to address this.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

picotron by huggingface

4.8%
2k
Minimalist distributed training framework for educational use
Created 1 year ago
Updated 3 weeks ago
Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
12 more.

EasyLM by young-geng

0.0%
2k
LLM training/finetuning framework in JAX/Flax
Created 2 years ago
Updated 1 year ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.2%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 2 days ago
Feedback? Help us improve.