NanoDL is a Jax-based library for building and training transformer models from scratch, targeting AI/ML experts who need to develop smaller-scale, efficient models. It provides a pedagogical approach with modular code, enabling customization and accelerated development of neural networks with distributed training capabilities.
How It Works
NanoDL leverages Jax and Flax for efficient computation and distributed training. Its core design emphasizes modularity, with each model and its components contained in single files to minimize dependencies and facilitate learning. This approach allows users to easily select, combine, and modify layers and blocks, including specialized ones like RoPE, GQA, and MQA, for flexible model development.
Quick Start & Requirements
- Install via pip:
pip install nanodl
- Prerequisites: Python 3.9+, JAX, Flax, Optax (GPU support recommended for training).
- Training requires 1 to N GPUs/TPUs; CPU-only JAX supports model creation.
- Official documentation and examples are available via Discord and clickable badges in the README.
Highlighted Details
- Implements a wide array of transformer blocks and layers not found in Flax/Jax.
- Includes implementations for models like Gemma, LlaMa3, Mistral, GPT3/4, T5, Whisper, ViT, and CLIP.
- Offers data-parallel trainers for multi-GPU/TPU training and simplified data handling with custom dataloaders.
- Accelerates classical ML models (PCA, KMeans, etc.) on GPUs/TPUs.
Maintenance & Community
- The project encourages contributions and feedback via Discord, issues, and pull requests.
- Experimental features like MAMBA, KAN, BitNet, GAT, and RLHF are available for direct copying from the repository.
- The long-term goal is to build "nano" versions of popular models (<1B parameters) with competitive performance.
Licensing & Compatibility
- The library is available under an unspecified license, but the README implies open contribution and use. Further clarification on licensing is recommended for commercial applications.
Limitations & Caveats
- The project is explicitly stated to be in development ("still in dev, works great but roughness is expected"), with experimental features not yet packaged. Contributions are highly encouraged to address this.