1.5-Pints  by Pints-AI

LLM recipe for pre-training models

Created 1 year ago
323 stars

Top 84.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the architecture, training scripts, and utilities for the 1.5-Pints and 0.12-Pint language models, designed to be comparable to models like OpenELM and Phi. It targets researchers and developers interested in replicating, experimenting with, and advancing open-source LLM pre-training, offering a recipe for achieving competitive performance in significantly reduced training times.

How It Works

The project emphasizes a "quality data" approach to pre-training, enabling rapid development of capable LLMs. It leverages PyTorch Lightning for distributed training and includes scripts for dataset preparation, model pre-training, fine-tuning, and evaluation. The architecture configurations are managed within lit_gpt/config.py, allowing users to select different model sizes and parameters.

Quick Start & Requirements

  • Installation: Requires Ubuntu 22.04 LTS or Debian 12 (x86-64 only; ARM64 is not supported).
  • Prerequisites: Miniconda3 for environment management, CUDA Toolkit 12.1.1 (installed within the conda environment), git-lfs. Python 3.10 is recommended.
  • Setup: Clone the repo, create and activate a conda environment, install dependencies (pip install -r requirements.txt, pip install flash-attn --no-build-isolation, pip install -r pretrain/requirements.txt), download and prepare datasets, and then train using fabric run.
  • Links: Discord: https://discord.com/invite/RSHk22Z29j, Paper: https://arxiv.org/abs/2408.03506.

Highlighted Details

  • Pre-training achieved in 9 days, aiming for parity with established models.
  • Detailed scripts for pre-training, fine-tuning, and Direct Preference Optimization (DPO).
  • Utilities for converting trained models to Hugging Face format (PyTorch and Safetensors).
  • Includes a testing suite for code validation.

Maintenance & Community

The project is developed by Pints.AI. Community support and discussion are available via their Discord server.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

ARM64/aarch64 processors are not supported due to xformers incompatibility. Python 3.12 is noted to break functionality, and Python 3.11 has not been tested. The installation process requires careful management of CUDA versions within conda environments.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

direct-preference-optimization by eric-mitchell

0.3%
3k
Reference implementation for Direct Preference Optimization (DPO)
Created 2 years ago
Updated 1 year ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Travis Fischer Travis Fischer(Founder of Agentic), and
8 more.

corenet by apple

0.0%
7k
DNN toolkit for training standard and novel models
Created 1 year ago
Updated 3 weeks ago
Feedback? Help us improve.