ru-gpts by ai-forever

Russian GPT models for text generation and related tasks

Created 5 years ago

2,096 stars

Top 21.1% on SourcePulse

Project Summary

This repository provides a suite of large language models (LLMs) specifically trained for the Russian language, including ruGPT3XL, ruGPT3Large, ruGPT3Medium, ruGPT3Small, and ruGPT2Large. It targets researchers and developers working with Russian NLP tasks, offering pre-trained models and fine-tuning capabilities to generate Russian text, simplify sentences, and more.

How It Works

The models are autoregressive transformers, with ruGPT3XL featuring sparse and dense attention blocks for efficient processing of longer sequences (up to 2048). The other models utilize standard transformer architectures. Training was conducted on extensive Russian language datasets (up to 80B tokens) using DeepSpeed and Megatron frameworks on substantial GPU clusters, achieving competitive perplexity scores.

Quick Start & Requirements

ruGPT3XL: Requires installation of apex, triton, deepspeed, transformers, huggingface_hub, and timm. Specific setup involves cloning the repo and copying utility files into the transformers and apex libraries.
Other Models: Primarily requires transformers (version 4.24.0 recommended).
Hardware: ruGPT3XL setup and usage examples suggest GPU acceleration (CUDA) and potentially multi-GPU setups for training/inference.

Highlighted Details

ruGPT3XL trained on 80B tokens with sparse attention, achieving a perplexity of 12.05.
Models are available via Hugging Face model hub (e.g., sberbank-ai/rugpt3large_based_on_gpt2).
Includes examples for fine-tuning and text generation.
Several open-source projects leverage these models for tasks like text simplification and copywriting.

Maintenance & Community

The project is associated with Sberbank AI. Links to Hugging Face model cards and pretraining scripts are provided. No direct community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The setup for ruGPT3XL is complex, requiring specific library versions and manual file copying, which may be prone to environment issues. The README does not detail specific hardware requirements for inference, only for training.

ru-gpts by ai-forever

Explore Similar Projects

Kevinpro-NLP-demo by Ricardokevins

llms by IbrahimSobh

CPT by fastnlp

ru_transformers by mgrankin

keras-transformer by kpot

arabert by aub-mind

training-fine-tuning-large-language-models-workshop-dhs2024 by dipanjanS

NLP-Tutorials by MorvanZhou

GLM by THUDM

Transformers-for-Natural-Language-Processing by PacktPublishing

makemore by karpathy

fairseq by facebookresearch