ru_transformers  by mgrankin

GPT-2 finetuning notebook for Russian language models

created 5 years ago
771 stars

Top 46.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides tools and pre-trained models for Russian GPT-2 language generation, targeting researchers and developers interested in fine-tuning or deploying large language models for Russian text. It offers a comprehensive guide for training, evaluation, and deployment, including performance benchmarks and detailed instructions for dataset preparation and model configuration.

How It Works

The project leverages the GPT-2 architecture and implements progressive layer unfreezing for efficient transfer learning. It utilizes a custom YTTM tokenizer, noted for its speed and smaller file sizes compared to SentencePiece. Training is optimized with mixed-precision (fp16) and supports both GPU and Google TPU acceleration.

Quick Start & Requirements

Highlighted Details

  • Perplexity benchmarks provided for various model sizes (124M, 355M) and training configurations on different Russian datasets.
  • Supports gradual unfreezing strategy (0, 1, 2, 7, -1) for progressive training.
  • Includes scripts for model evaluation, text processing, and token conversion.
  • Offers a REST API deployment example using uvicorn.

Maintenance & Community

  • The project appears to be maintained by mgrankin.
  • Links to Telegram bots (@PorfBot, @NeuroPoetBot) for direct model interaction are provided.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The README mentions potential issues with Apex and DataParallel (apex/issues/227), which might affect mixed-precision training on certain configurations.
  • Instructions for SentencePiece installation are provided but noted as skippable if using YTTM.
  • The project relies on AWS S3 for model distribution.
Health Check
Last commit

4 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai) and Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code).

GPT2 by ConnorJL

0%
1k
GPT2 training implementation, supporting TPUs and GPUs
created 6 years ago
updated 2 years ago
Feedback? Help us improve.