faster-rnnlm by yandex

Faster RNN Language Modeling Toolkit

Created 10 years ago

564 stars

Top 56.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Soumith Chintala

Coauthor of PyTorch

Project Summary

This toolkit provides a highly optimized implementation of Recurrent Neural Network Language Models (RNNLMs) designed for training on massive datasets with very large vocabularies. It targets researchers and practitioners in Automatic Speech Recognition (ASR) and Machine Translation (MT) who need to achieve state-of-the-art performance and scalability. The primary benefit is significantly faster training and inference speeds compared to other implementations, enabling practical application on real-world, large-scale problems.

How It Works

The core of faster-rnnlm leverages efficient implementations of various RNN architectures, including GRU variants, and supports advanced training techniques like Noise Contrastive Estimation (NCE) and Hierarchical Softmax (HS). NCE is particularly highlighted for its ability to scale training speed independently of vocabulary size, offering comparable or better results than HS, which can become computationally infeasible for large vocabularies. The toolkit also incorporates optimizations such as ReLU activation, diagonal initialization, RMSProp, and gradient clipping for improved training dynamics and performance.

Quick Start & Requirements

Installation: Run ./build.sh to download Eigen and compile the toolkit.
Prerequisites: None explicitly mentioned beyond standard build tools.
Usage: Training command example: ./rnnlm -rnnlm model_name -train train.txt -valid validation.txt -hidden 128 -hidden-type gru -nce 20 -alpha 0.01.
Data Format: Training and validation files should contain one sentence per line.
Vocabulary: By default, all distinct words are used. For limited vocabularies, words outside the top N should be mapped to an OOV token.

Highlighted Details

Achieves over 250k words/second on the One Billion Word Benchmark with 8 threads on a 3.3GHz CPU.
NCE training scales well with vocabulary size and can achieve comparable or better perplexity than Hierarchical Softmax.
Supports various hidden layer types (sigmoid, tanh, relu, gru variants) and direct connections for integrating Maximum Entropy models.

Maintenance & Community

The project appears to be a research-oriented toolkit, with references to academic papers and researchers like Mikolov. No specific community channels (Discord, Slack) or active development indicators are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given its origins and the nature of such toolkits, users should verify licensing for commercial or derivative use.

Limitations & Caveats

Performance scaling with threads is sub-linear due to factors like cache misses.
CUDA support is primarily for accelerating validation/test entropy calculation in NCE mode, not for general training acceleration.
Some configurations, like truncated ReLU with NCE, may not train effectively.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days