bigscience  by bigscience-workshop

Large-scale LLM training and scaling infrastructure

Created 4 years ago
1,004 stars

Top 37.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a central hub for the BigScience workshop's engineering and scaling efforts in large language models. It complements the primary Megatron-DeepSpeed codebase by providing comprehensive documentation, experimental results, SLURM scripts, and detailed logs for various large-scale LLM training runs, benefiting researchers and engineers focused on LLM development and scaling.

How It Works

This repository acts as a meta-repository, coordinating efforts and providing infrastructure details for large language model training. It stores documentation, experimental data, and environment configurations, enabling reproducibility and analysis of large-scale LLM training runs, complementing the core Megatron-DeepSpeed codebase.

Quick Start & Requirements

This repository does not provide a direct installation or execution command. Instead, it serves as a collection of documentation, scripts, and logs related to large-scale LLM training. Accessing and utilizing the content requires familiarity with the bigscience-workshop/Megatron-DeepSpeed repository and likely involves significant computational resources and a SLURM-based environment for running or analyzing the provided scripts and logs. Links to specific training logs and tensorboard instances are provided within the README.

Highlighted Details

  • Detailed documentation and logs for multiple large-scale LLM training runs, including 13B, 104B, and 176B parameter models.
  • Information on training configurations, including datasets (C4, OSCAR, Pile) and warmup strategies.
  • Scripts for live monitoring of training logs via remote file syncing.
  • References to lessons learned and hub integration for BigScience projects.

Maintenance & Community

The "bigscience-workshop" name implies a large, collaborative effort, but specific details regarding maintainers, community channels (like Discord/Slack), or a public roadmap are not present in this README snippet.

Licensing & Compatibility

The provided README content does not specify a software license. This lack of explicit licensing information may pose a barrier to adoption, particularly for commercial use or integration into closed-source projects.

Limitations & Caveats

This repository is not a standalone, runnable software project but rather a collection of supporting materials for complex LLM training infrastructure. Users require access to the Megatron-DeepSpeed codebase and substantial computational resources. The absence of explicit licensing information is a notable caveat.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
14 more.

BIG-bench by google

0.1%
3k
Collaborative benchmark for probing and extrapolating LLM capabilities
Created 4 years ago
Updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

text-to-text-transfer-transformer by google-research

0.1%
6k
Unified text-to-text transformer for NLP research
Created 6 years ago
Updated 5 months ago
Feedback? Help us improve.