YaLM-100B by yandex

GPT-like neural network for text generation/processing

Created 3 years ago

3,759 stars

Top 12.8% on SourcePulse

8 Experts Love This Project

JohannesHa

Johannes Hagemann

Cofounder of Prime Intellect

bshlgrs

Cofounder of Redwood Research

shizhediao

Author of LMFlow; Research Scientist at NVIDIA

huybery

Research Scientist at Alibaba Qwen

and 4 more!

Project Summary

YaLM-100B is a large language model with 100 billion parameters, designed for text generation and processing. It is suitable for developers and researchers working with English and Russian text, offering a powerful GPT-like architecture.

How It Works

The model is a GPT-like neural network trained on a massive dataset of 1.7 TB of text, including books and online sources in English and Russian. It leverages DeepSpeed and inspiration from Megatron-LM for training, utilizing tensor parallelism for efficient multi-GPU inference. The provided code is a modified version of the DeepSpeed Megatron-LM example, optimized for YaLM-100B inference.

Quick Start & Requirements

Install/Run: Use bash download/download.sh to get weights and vocabulary. The model requires approximately 200GB of GPU memory and was tested on configurations with 4 A100 80g or 8 V100 32g GPUs. Docker images are available for A100 and V100.
Links: Medium, Habr, Hugging Face

Highlighted Details

GPT-like architecture with 100 billion parameters.
Trained on 1.7 TB of English and Russian text data.
Utilizes DeepSpeed and Megatron-LM principles for training and inference.
Supports interactive, conditional, and unconditional text generation.

Maintenance & Community

The project is maintained by Yandex. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The model is published under the Apache 2.0 license, permitting research and commercial use. Megatron-LM is under its own license.

Limitations & Caveats

The provided code is a modified DeepSpeed example, not the exact training code. Inference requires significant GPU resources (≈200GB total memory).

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

4 stars in the last 30 days

Explore Similar Projects

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect) and

Johannes Hagemann

Johannes Hagemann(Cofounder of Prime Intellect).

magnetron by MarioSieg

Minimalist PyTorch alternative for research/production

Created 1 year ago

Updated 2 weeks ago

gigaGPT by Cerebras

Simple codebase for training large language models

Created 2 years ago

Updated 8 months ago

Starred by

Lewis Tunstall

Lewis Tunstall(Research Engineer at Hugging Face).

DALM by arcee-ai

RAG toolkit for domain-specific language modeling

Created 2 years ago

Updated 1 year ago

LLM-Pretrain-FineTune by X-jun-0130

LLM pretraining and fine-tuning for medical dialogue

Created 2 years ago

Updated 1 year ago

Starred by

Jesse Clark

Jesse Clark(Cofounder of Marqo),

Amanpreet Singh

Amanpreet Singh(Cofounder of Contextual AI), and

5 more.

gritlm by ContextualAI

Research paper and models for generative representational instruction tuning

Created 1 year ago

Updated 6 months ago

Starred by

Bryan Helmig

Bryan Helmig(Cofounder of Zapier).

gpt2client by rish-16

TensorFlow wrapper for GPT-2 text generation models

Created 6 years ago

Updated 4 years ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

2 more.

recurrent-pretraining by seal-rg

Pretraining code for depth-recurrent language model research

Created 11 months ago

Updated 1 week ago

Starred by

Woosuk Kwon

Woosuk Kwon(Coauthor of vLLM).

fairseq2 by facebookresearch

Sequence modeling toolkit for content generation research

Created 3 years ago

Updated 1 day ago

Starred by

Lysandre Debut

Lysandre Debut(Chief Open-Source Officer at Hugging Face),

Brendan Falk

Brendan Falk(Cofounder of Fig), and

1 more.

aitextgen by minimaxir

Python tool for text-based AI training and generation

Created 6 years ago

Updated 2 years ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA),

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and

8 more.

EAGLE by SafeAILab

Speculative decoding research paper for faster LLM inference

Created 2 years ago

Updated 3 weeks ago

Starred by

Eric Zhu

Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

1 more.

textgrad by zou-group

Autograd engine for textual gradients, enabling LLM-driven optimization

Created 1 year ago

Updated 5 months ago

Starred by

Lilian Weng

Lilian Weng(Cofounder of Thinking Machines Lab),

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity), and

45 more.

fairseq by facebookresearch

Sequence modeling toolkit for translation, language modeling, and text generation research

Created 8 years ago

Updated 3 months ago

Feedback? Help us improve.