faster-rnnlm  by yandex

Faster RNN Language Modeling Toolkit

Created 10 years ago
563 stars

Top 57.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This toolkit provides a highly optimized implementation of Recurrent Neural Network Language Models (RNNLMs) designed for training on massive datasets with very large vocabularies. It targets researchers and practitioners in Automatic Speech Recognition (ASR) and Machine Translation (MT) who need to achieve state-of-the-art performance and scalability. The primary benefit is significantly faster training and inference speeds compared to other implementations, enabling practical application on real-world, large-scale problems.

How It Works

The core of faster-rnnlm leverages efficient implementations of various RNN architectures, including GRU variants, and supports advanced training techniques like Noise Contrastive Estimation (NCE) and Hierarchical Softmax (HS). NCE is particularly highlighted for its ability to scale training speed independently of vocabulary size, offering comparable or better results than HS, which can become computationally infeasible for large vocabularies. The toolkit also incorporates optimizations such as ReLU activation, diagonal initialization, RMSProp, and gradient clipping for improved training dynamics and performance.

Quick Start & Requirements

  • Installation: Run ./build.sh to download Eigen and compile the toolkit.
  • Prerequisites: None explicitly mentioned beyond standard build tools.
  • Usage: Training command example: ./rnnlm -rnnlm model_name -train train.txt -valid validation.txt -hidden 128 -hidden-type gru -nce 20 -alpha 0.01.
  • Data Format: Training and validation files should contain one sentence per line.
  • Vocabulary: By default, all distinct words are used. For limited vocabularies, words outside the top N should be mapped to an OOV token.

Highlighted Details

  • Achieves over 250k words/second on the One Billion Word Benchmark with 8 threads on a 3.3GHz CPU.
  • NCE training scales well with vocabulary size and can achieve comparable or better perplexity than Hierarchical Softmax.
  • Supports various hidden layer types (sigmoid, tanh, relu, gru variants) and direct connections for integrating Maximum Entropy models.

Maintenance & Community

  • The project appears to be a research-oriented toolkit, with references to academic papers and researchers like Mikolov. No specific community channels (Discord, Slack) or active development indicators are mentioned in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. Given its origins and the nature of such toolkits, users should verify licensing for commercial or derivative use.

Limitations & Caveats

  • Performance scaling with threads is sub-linear due to factors like cache misses.
  • CUDA support is primarily for accelerating validation/test entropy calculation in NCE mode, not for general training acceleration.
  • Some configurations, like truncated ReLU with NCE, may not train effectively.
Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
8 more.

modded-nanogpt by KellerJordan

0.7%
3k
Language model training speedrun on 8x H100 GPUs
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.