1.5-Pints  by Pints-AI

LLM recipe for pre-training models

created 1 year ago
320 stars

Top 86.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the architecture, training scripts, and utilities for the 1.5-Pints and 0.12-Pint language models, designed to be comparable to models like OpenELM and Phi. It targets researchers and developers interested in replicating, experimenting with, and advancing open-source LLM pre-training, offering a recipe for achieving competitive performance in significantly reduced training times.

How It Works

The project emphasizes a "quality data" approach to pre-training, enabling rapid development of capable LLMs. It leverages PyTorch Lightning for distributed training and includes scripts for dataset preparation, model pre-training, fine-tuning, and evaluation. The architecture configurations are managed within lit_gpt/config.py, allowing users to select different model sizes and parameters.

Quick Start & Requirements

  • Installation: Requires Ubuntu 22.04 LTS or Debian 12 (x86-64 only; ARM64 is not supported).
  • Prerequisites: Miniconda3 for environment management, CUDA Toolkit 12.1.1 (installed within the conda environment), git-lfs. Python 3.10 is recommended.
  • Setup: Clone the repo, create and activate a conda environment, install dependencies (pip install -r requirements.txt, pip install flash-attn --no-build-isolation, pip install -r pretrain/requirements.txt), download and prepare datasets, and then train using fabric run.
  • Links: Discord: https://discord.com/invite/RSHk22Z29j, Paper: https://arxiv.org/abs/2408.03506.

Highlighted Details

  • Pre-training achieved in 9 days, aiming for parity with established models.
  • Detailed scripts for pre-training, fine-tuning, and Direct Preference Optimization (DPO).
  • Utilities for converting trained models to Hugging Face format (PyTorch and Safetensors).
  • Includes a testing suite for code validation.

Maintenance & Community

The project is developed by Pints.AI. Community support and discussion are available via their Discord server.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

ARM64/aarch64 processors are not supported due to xformers incompatibility. Python 3.12 is noted to break functionality, and Python 3.11 has not been tested. The installation process requires careful management of CUDA versions within conda environments.

Health Check
Last commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Starred by Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), Daniel Han Daniel Han(Cofounder of Unsloth), and
4 more.

open-instruct by allenai

0.2%
3k
Training codebase for instruction-following language models
created 2 years ago
updated 14 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.