YaLM-100B  by yandex

GPT-like neural network for text generation/processing

Created 3 years ago
3,757 stars

Top 12.9% on SourcePulse

GitHubView on GitHub
Project Summary

YaLM-100B is a large language model with 100 billion parameters, designed for text generation and processing. It is suitable for developers and researchers working with English and Russian text, offering a powerful GPT-like architecture.

How It Works

The model is a GPT-like neural network trained on a massive dataset of 1.7 TB of text, including books and online sources in English and Russian. It leverages DeepSpeed and inspiration from Megatron-LM for training, utilizing tensor parallelism for efficient multi-GPU inference. The provided code is a modified version of the DeepSpeed Megatron-LM example, optimized for YaLM-100B inference.

Quick Start & Requirements

  • Install/Run: Use bash download/download.sh to get weights and vocabulary. The model requires approximately 200GB of GPU memory and was tested on configurations with 4 A100 80g or 8 V100 32g GPUs. Docker images are available for A100 and V100.
  • Links: Medium, Habr, Hugging Face

Highlighted Details

  • GPT-like architecture with 100 billion parameters.
  • Trained on 1.7 TB of English and Russian text data.
  • Utilizes DeepSpeed and Megatron-LM principles for training and inference.
  • Supports interactive, conditional, and unconditional text generation.

Maintenance & Community

The project is maintained by Yandex. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The model is published under the Apache 2.0 license, permitting research and commercial use. Megatron-LM is under its own license.

Limitations & Caveats

The provided code is a modified DeepSpeed example, not the exact training code. Inference requires significant GPU resources (≈200GB total memory).

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

recurrent-pretraining by seal-rg

0%
827
Pretraining code for depth-recurrent language model research
Created 7 months ago
Updated 1 week ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

10.6%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 1 week ago
Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

textgrad by zou-group

0.7%
3k
Autograd engine for textual gradients, enabling LLM-driven optimization
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.