Discover and explore top open-source AI tools and projects—updated daily.
Minimalist framework for language model training and learning dynamics research
Top 90.8% on SourcePulse
Pico Train is a minimalistic, research-focused framework for training language models from 1 million to 1 billion parameters. It addresses the need for transparent, reproducible learning dynamics research by providing comprehensive, granular checkpoints that include activations and gradients, alongside a standardized data and architecture approach for cross-scale comparisons. The target audience includes researchers and engineers interested in understanding the internal workings and scaling laws of LLMs.
How It Works
Pico Train utilizes a standardized LLAMA-style architecture (Pico Decoder) with components like RMSNorm, RoPE, and SwiGLU. Its core advantage lies in its "comprehensive checkpointing" strategy, which automatically saves not only model and optimizer states but also activations and gradients at regular intervals. This rich data, combined with a consistent training philosophy (identical data, architecture, optimizer settings across scales), enables direct comparison of learning dynamics as model size varies, isolating size as the primary variable.
Quick Start & Requirements
source setup.sh
(creates Poetry environment, installs dependencies)..env
file).poetry run train --config_path configs/demo.yaml
.Highlighted Details
pico-analyze
library for advanced post-training interpretation.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Currently supports only the Pico Decoder architecture, with plans for future expansion. The framework is geared towards research and may require adaptation for production deployment workflows.
3 months ago
1 day