ru-gpts  by ai-forever

Russian GPT models for text generation and related tasks

created 4 years ago
2,098 stars

Top 21.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a suite of large language models (LLMs) specifically trained for the Russian language, including ruGPT3XL, ruGPT3Large, ruGPT3Medium, ruGPT3Small, and ruGPT2Large. It targets researchers and developers working with Russian NLP tasks, offering pre-trained models and fine-tuning capabilities to generate Russian text, simplify sentences, and more.

How It Works

The models are autoregressive transformers, with ruGPT3XL featuring sparse and dense attention blocks for efficient processing of longer sequences (up to 2048). The other models utilize standard transformer architectures. Training was conducted on extensive Russian language datasets (up to 80B tokens) using DeepSpeed and Megatron frameworks on substantial GPU clusters, achieving competitive perplexity scores.

Quick Start & Requirements

  • ruGPT3XL: Requires installation of apex, triton, deepspeed, transformers, huggingface_hub, and timm. Specific setup involves cloning the repo and copying utility files into the transformers and apex libraries.
  • Other Models: Primarily requires transformers (version 4.24.0 recommended).
  • Hardware: ruGPT3XL setup and usage examples suggest GPU acceleration (CUDA) and potentially multi-GPU setups for training/inference.

Highlighted Details

  • ruGPT3XL trained on 80B tokens with sparse attention, achieving a perplexity of 12.05.
  • Models are available via Hugging Face model hub (e.g., sberbank-ai/rugpt3large_based_on_gpt2).
  • Includes examples for fine-tuning and text generation.
  • Several open-source projects leverage these models for tasks like text simplification and copywriting.

Maintenance & Community

The project is associated with Sberbank AI. Links to Hugging Face model cards and pretraining scripts are provided. No direct community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The setup for ruGPT3XL is complex, requiring specific library versions and manual file copying, which may be prone to environment issues. The README does not detail specific hardware requirements for inference, only for training.

Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Feedback? Help us improve.