nmt by tensorflow

Build state-of-the-art Neural Machine Translation systems

Created 8 years ago

6,455 stars

Top 7.9% on SourcePulse

View on GitHub

8 Experts Love This Project

Lei Zhang

Director Engineering AI at AMD

Philipp Schmid

DevRel at Google DeepMind

Chaoyu Yang

Founder of Bento

Yaowei Zheng

Author of LLaMA-Factory

and 4 more!

Project Summary

This repository provides a comprehensive tutorial for building state-of-the-art Neural Machine Translation (NMT) systems using TensorFlow's sequence-to-sequence (seq2seq) models. It targets researchers and engineers seeking to understand and implement advanced NMT techniques, offering a practical guide to advanced NMT architectures, including attention mechanisms, and best practices for achieving high-quality translations, akin to Google's NMT system.

How It Works

The core of the project is an encoder-decoder architecture implemented with Recurrent Neural Networks (RNNs), specifically Long Short-Term Memory (LSTM) cells. The encoder processes the source sentence into a context vector, which the decoder then uses to generate the target translation. A key feature is the integration of attention mechanisms (Luong and Bahdanau styles), which allow the decoder to dynamically focus on relevant parts of the source sentence, significantly improving performance on longer sequences and complex linguistic structures. The codebase emphasizes production-ready practices and incorporates tips for optimizing speed and translation quality.

Quick Start & Requirements

Primary install: Clone the repository: git clone https://github.com/tensorflow/nmt/
Prerequisites: Requires TensorFlow Nightly. For stable TensorFlow versions, consider older branches like tf-1.4. Data download scripts are provided for IWSLT English-Vietnamese (nmt/scripts/download_iwslt15.sh) and WMT German-English (nmt/scripts/wmt16_en_de.sh).

Example Training Command:

mkdir /tmp/nmt_model
python -m nmt.nmt \
    --src=vi --tgt=en \
    --vocab_prefix=/tmp/nmt_data/vocab \
    --train_prefix=/tmp/nmt_data/train \
    --dev_prefix=/tmp/nmt_data/tst2012 \
    --test_prefix=/tmp/nmt_data/tst2013 \
    --out_dir=/tmp/nmt_model \
    --num_train_steps=12000 \
    --num_layers=2 \
    --num_units=128 \
    --dropout=0.2 \
    --metrics=bleu

Links: Google Research Blogpost, GitHub repository.

Highlighted Details

Attention Mechanisms: Offers implementations of Luong and Bahdanau attention, integrated via AttentionWrapper for easy adoption.
Advanced Features: Supports bidirectional RNNs for encoders, beam search for improved inference, and multi-GPU training strategies, including GNMT attention for enhanced parallelism.
Data Handling: Details the use of tf.data iterators for efficient input pipelines, including batching, padding, and bucketing of variable-length sequences.
Benchmarks: Provides comprehensive performance benchmarks (BLEU scores) and training speed comparisons on IWSLT and WMT datasets, contrasting different model configurations and decoding strategies.

Maintenance & Community

The project is presented as a tutorial with contributions from Google Research. No specific community channels (like Discord or Slack) or active maintenance roadmaps are detailed in the README.

Licensing & Compatibility

The provided README does not specify a software license. While described as "production-ready," users should verify licensing terms for commercial or closed-source integration.

Limitations & Caveats

This tutorial version explicitly requires TensorFlow Nightly; compatibility with older TensorFlow releases may necessitate using different branches. The focus is on demonstrating NMT concepts and achieving competitive results on specific benchmark datasets, rather than serving as a continuously maintained NMT library.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days