pythia by EleutherAI

LLM suite for interpretability, learning dynamics, ethics, and transparency research

Created 4 years ago

2,712 stars

Top 17.3% on SourcePulse

View on GitHub

10 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Omar Sanseviero

DevRel at Google DeepMind

Calvin French-Owen

Cofounder of Segment

Vincent Weisser

Cofounder of Prime Intellect

and 6 more!

Project Summary

The Pythia suite provides a comprehensive set of autoregressive transformer models, ranging from 14M to 12B parameters, specifically designed for interpretability research. It offers 154 checkpoints per model, enabling detailed analysis of learning dynamics and knowledge evolution during training. The suite is ideal for researchers focused on understanding LLM internals, training stability, and ethical considerations.

How It Works

Pythia models are trained on the Pile dataset (or its deduplicated version) with consistent data ordering and training procedures across all sizes. This uniformity allows for direct comparison and causal analysis of how scale and training dynamics influence model behavior. The availability of numerous intermediate checkpoints is a key differentiator, facilitating fine-grained studies of emergent properties and internal representations.

Quick Start & Requirements

Install/Run: Models can be loaded via Hugging Face Transformers:

from transformers import GPTNeoXForCausalLM, AutoTokenizer
model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/pythia-70m-deduped", revision="step3000")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m-deduped", revision="step3000")

Prerequisites: PyTorch, Hugging Face libraries. Reproducing training requires the GPT-NeoX library, Docker, and significant disk space for datasets.
Resources: Loading models requires standard GPU memory. Full training reproduction is resource-intensive.
Links: Pythia Paper, Hugging Face Hub, LM Evaluation Harness

Highlighted Details

10 model sizes (14M to 12B parameters) trained on the Pile dataset.
154 checkpoints available for each model, enabling fine-grained temporal analysis.
Models trained with identical data order across all sizes for direct comparison.
Includes "v0" models with minor inconsistencies for ablation studies.

Maintenance & Community

The project is actively maintained by EleutherAI, a prominent research collective in the LLM space. Related research papers are frequently added. Community interaction is primarily through GitHub issues and discussions.

Licensing & Compatibility

All code and models are released under the Apache License 2.0, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

The README notes that evaluation benchmarks were run with an older version of the LM Evaluation Harness and may not be reproducible with current versions. Some older "v0" models have minor inconsistencies.

Health Check

Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

26 stars in the last 30 days