bigscience  by bigscience-workshop

Large-scale LLM training and scaling infrastructure

Created 4 years ago
1,006 stars

Top 37.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a central hub for the BigScience workshop's engineering and scaling efforts in large language models. It complements the primary Megatron-DeepSpeed codebase by providing comprehensive documentation, experimental results, SLURM scripts, and detailed logs for various large-scale LLM training runs, benefiting researchers and engineers focused on LLM development and scaling.

How It Works

This repository acts as a meta-repository, coordinating efforts and providing infrastructure details for large language model training. It stores documentation, experimental data, and environment configurations, enabling reproducibility and analysis of large-scale LLM training runs, complementing the core Megatron-DeepSpeed codebase.

Quick Start & Requirements

This repository does not provide a direct installation or execution command. Instead, it serves as a collection of documentation, scripts, and logs related to large-scale LLM training. Accessing and utilizing the content requires familiarity with the bigscience-workshop/Megatron-DeepSpeed repository and likely involves significant computational resources and a SLURM-based environment for running or analyzing the provided scripts and logs. Links to specific training logs and tensorboard instances are provided within the README.

Highlighted Details

  • Detailed documentation and logs for multiple large-scale LLM training runs, including 13B, 104B, and 176B parameter models.
  • Information on training configurations, including datasets (C4, OSCAR, Pile) and warmup strategies.
  • Scripts for live monitoring of training logs via remote file syncing.
  • References to lessons learned and hub integration for BigScience projects.

Maintenance & Community

The "bigscience-workshop" name implies a large, collaborative effort, but specific details regarding maintainers, community channels (like Discord/Slack), or a public roadmap are not present in this README snippet.

Licensing & Compatibility

The provided README content does not specify a software license. This lack of explicit licensing information may pose a barrier to adoption, particularly for commercial use or integration into closed-source projects.

Limitations & Caveats

This repository is not a standalone, runnable software project but rather a collection of supporting materials for complex LLM training infrastructure. Users require access to the Megatron-DeepSpeed codebase and substantial computational resources. The absence of explicit licensing information is a notable caveat.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wei-Lin Chiang Wei-Lin Chiang(Cofounder of LMArena), and
13 more.

awesome-tensor-compilers by merrymercy

0.4%
3k
Curated list of tensor compiler projects and papers
Created 5 years ago
Updated 1 year ago
Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
14 more.

BIG-bench by google

0.2%
3k
Collaborative benchmark for probing and extrapolating LLM capabilities
Created 4 years ago
Updated 1 year ago
Starred by Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
14 more.

simpletransformers by ThilinaRajapakse

0.0%
4k
Rapid NLP task implementation
Created 6 years ago
Updated 3 months ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

text-to-text-transfer-transformer by google-research

0.1%
6k
Unified text-to-text transformer for NLP research
Created 6 years ago
Updated 3 weeks ago
Starred by Vaibhav Nivargi Vaibhav Nivargi(Cofounder of Moveworks), Chuan Li Chuan Li(Chief Scientific Officer at Lambda), and
5 more.

awesome-mlops by visenger

0.1%
13k
Curated MLOps knowledge hub
Created 5 years ago
Updated 1 year ago
Feedback? Help us improve.