Russian GPT models for text generation and related tasks
Top 21.9% on sourcepulse
This repository provides a suite of large language models (LLMs) specifically trained for the Russian language, including ruGPT3XL, ruGPT3Large, ruGPT3Medium, ruGPT3Small, and ruGPT2Large. It targets researchers and developers working with Russian NLP tasks, offering pre-trained models and fine-tuning capabilities to generate Russian text, simplify sentences, and more.
How It Works
The models are autoregressive transformers, with ruGPT3XL featuring sparse and dense attention blocks for efficient processing of longer sequences (up to 2048). The other models utilize standard transformer architectures. Training was conducted on extensive Russian language datasets (up to 80B tokens) using DeepSpeed and Megatron frameworks on substantial GPU clusters, achieving competitive perplexity scores.
Quick Start & Requirements
apex
, triton
, deepspeed
, transformers
, huggingface_hub
, and timm
. Specific setup involves cloning the repo and copying utility files into the transformers
and apex
libraries.transformers
(version 4.24.0 recommended).Highlighted Details
sberbank-ai/rugpt3large_based_on_gpt2
).Maintenance & Community
The project is associated with Sberbank AI. Links to Hugging Face model cards and pretraining scripts are provided. No direct community links (Discord/Slack) are mentioned in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The setup for ruGPT3XL is complex, requiring specific library versions and manual file copying, which may be prone to environment issues. The README does not detail specific hardware requirements for inference, only for training.
2 years ago
1 week