pico-train  by pico-lm

Minimalist framework for language model training and learning dynamics research

Created 1 year ago
290 stars

Top 90.8% on SourcePulse

GitHubView on GitHub
Project Summary

Pico Train is a minimalistic, research-focused framework for training language models from 1 million to 1 billion parameters. It addresses the need for transparent, reproducible learning dynamics research by providing comprehensive, granular checkpoints that include activations and gradients, alongside a standardized data and architecture approach for cross-scale comparisons. The target audience includes researchers and engineers interested in understanding the internal workings and scaling laws of LLMs.

How It Works

Pico Train utilizes a standardized LLAMA-style architecture (Pico Decoder) with components like RMSNorm, RoPE, and SwiGLU. Its core advantage lies in its "comprehensive checkpointing" strategy, which automatically saves not only model and optimizer states but also activations and gradients at regular intervals. This rich data, combined with a consistent training philosophy (identical data, architecture, optimizer settings across scales), enables direct comparison of learning dynamics as model size varies, isolating size as the primary variable.

Quick Start & Requirements

  • Install via source setup.sh (creates Poetry environment, installs dependencies).
  • Requires Hugging Face and Weights & Biases API tokens (configured via .env file).
  • Training initiated with poetry run train --config_path configs/demo.yaml.
  • Full tutorial available at picolm.io.

Highlighted Details

  • Optimized for 1M-1B parameter models, targeting viable learning dynamics research.
  • Uses a pre-tokenized, pre-shuffled Dolma dataset for consistency.
  • Checkpoints include model state, optimizer state, activations, gradients, and logs.
  • Seamless integration with the pico-analyze library for advanced post-training interpretation.

Maintenance & Community

  • Primarily developed by Richard Diehl Martinez.
  • Website: picolm.io.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

Currently supports only the Pico Decoder architecture, with plans for future expansion. The framework is geared towards research and may require adaptation for production deployment workflows.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Starred by François Chollet François Chollet(Author of Keras; Cofounder of Ndea, ARC Prize) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

keras-hub by keras-team

0.6%
932
Pretrained model hub for Keras 3
Created 5 years ago
Updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
13 more.

torchtitan by pytorch

0.7%
4k
PyTorch platform for generative AI model training research
Created 1 year ago
Updated 21 hours ago
Feedback? Help us improve.