YaLM-100B  by yandex

GPT-like neural network for text generation/processing

Created 3 years ago
3,752 stars

Top 12.7% on SourcePulse

GitHubView on GitHub
Project Summary

YaLM-100B is a large language model with 100 billion parameters, designed for text generation and processing. It is suitable for developers and researchers working with English and Russian text, offering a powerful GPT-like architecture.

How It Works

The model is a GPT-like neural network trained on a massive dataset of 1.7 TB of text, including books and online sources in English and Russian. It leverages DeepSpeed and inspiration from Megatron-LM for training, utilizing tensor parallelism for efficient multi-GPU inference. The provided code is a modified version of the DeepSpeed Megatron-LM example, optimized for YaLM-100B inference.

Quick Start & Requirements

  • Install/Run: Use bash download/download.sh to get weights and vocabulary. The model requires approximately 200GB of GPU memory and was tested on configurations with 4 A100 80g or 8 V100 32g GPUs. Docker images are available for A100 and V100.
  • Links: Medium, Habr, Hugging Face

Highlighted Details

  • GPT-like architecture with 100 billion parameters.
  • Trained on 1.7 TB of English and Russian text data.
  • Utilizes DeepSpeed and Megatron-LM principles for training and inference.
  • Supports interactive, conditional, and unconditional text generation.

Maintenance & Community

The project is maintained by Yandex. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The model is published under the Apache 2.0 license, permitting research and commercial use. Megatron-LM is under its own license.

Limitations & Caveats

The provided code is a modified DeepSpeed example, not the exact training code. Inference requires significant GPU resources (≈200GB total memory).

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

0.4%
2k
Speculative decoding research paper for faster LLM inference
Created 2 years ago
Updated 5 days ago
Feedback? Help us improve.