GPT-like neural network for text generation/processing
Top 13.2% on sourcepulse
YaLM-100B is a large language model with 100 billion parameters, designed for text generation and processing. It is suitable for developers and researchers working with English and Russian text, offering a powerful GPT-like architecture.
How It Works
The model is a GPT-like neural network trained on a massive dataset of 1.7 TB of text, including books and online sources in English and Russian. It leverages DeepSpeed and inspiration from Megatron-LM for training, utilizing tensor parallelism for efficient multi-GPU inference. The provided code is a modified version of the DeepSpeed Megatron-LM example, optimized for YaLM-100B inference.
Quick Start & Requirements
bash download/download.sh
to get weights and vocabulary. The model requires approximately 200GB of GPU memory and was tested on configurations with 4 A100 80g or 8 V100 32g GPUs. Docker images are available for A100 and V100.Highlighted Details
Maintenance & Community
The project is maintained by Yandex. Further community engagement details are not explicitly provided in the README.
Licensing & Compatibility
The model is published under the Apache 2.0 license, permitting research and commercial use. Megatron-LM is under its own license.
Limitations & Caveats
The provided code is a modified DeepSpeed example, not the exact training code. Inference requires significant GPU resources (≈200GB total memory).
2 years ago
1 day