GPT-2 finetuning notebook for Russian language models
Top 46.2% on sourcepulse
This repository provides tools and pre-trained models for Russian GPT-2 language generation, targeting researchers and developers interested in fine-tuning or deploying large language models for Russian text. It offers a comprehensive guide for training, evaluation, and deployment, including performance benchmarks and detailed instructions for dataset preparation and model configuration.
How It Works
The project leverages the GPT-2 architecture and implements progressive layer unfreezing for efficient transfer learning. It utilizes a custom YTTM tokenizer, noted for its speed and smaller file sizes compared to SentencePiece. Training is optimized with mixed-precision (fp16) and supports both GPU and Google TPU acceleration.
Quick Start & Requirements
conda env create -f environment.yml
apt.txt
dependencies.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
apex/issues/227
), which might affect mixed-precision training on certain configurations.4 years ago
1 day