YaLM-100B  by yandex

GPT-like neural network for text generation/processing

created 3 years ago
3,757 stars

Top 13.2% on sourcepulse

GitHubView on GitHub
Project Summary

YaLM-100B is a large language model with 100 billion parameters, designed for text generation and processing. It is suitable for developers and researchers working with English and Russian text, offering a powerful GPT-like architecture.

How It Works

The model is a GPT-like neural network trained on a massive dataset of 1.7 TB of text, including books and online sources in English and Russian. It leverages DeepSpeed and inspiration from Megatron-LM for training, utilizing tensor parallelism for efficient multi-GPU inference. The provided code is a modified version of the DeepSpeed Megatron-LM example, optimized for YaLM-100B inference.

Quick Start & Requirements

  • Install/Run: Use bash download/download.sh to get weights and vocabulary. The model requires approximately 200GB of GPU memory and was tested on configurations with 4 A100 80g or 8 V100 32g GPUs. Docker images are available for A100 and V100.
  • Links: Medium, Habr, Hugging Face

Highlighted Details

  • GPT-like architecture with 100 billion parameters.
  • Trained on 1.7 TB of English and Russian text data.
  • Utilizes DeepSpeed and Megatron-LM principles for training and inference.
  • Supports interactive, conditional, and unconditional text generation.

Maintenance & Community

The project is maintained by Yandex. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The model is published under the Apache 2.0 license, permitting research and commercial use. Megatron-LM is under its own license.

Limitations & Caveats

The provided code is a modified DeepSpeed example, not the exact training code. Inference requires significant GPU resources (≈200GB total memory).

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.