nmt  by tensorflow

Build state-of-the-art Neural Machine Translation systems

Created 8 years ago
6,440 stars

Top 7.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive tutorial for building state-of-the-art Neural Machine Translation (NMT) systems using TensorFlow's sequence-to-sequence (seq2seq) models. It targets researchers and engineers seeking to understand and implement advanced NMT techniques, offering a practical guide to advanced NMT architectures, including attention mechanisms, and best practices for achieving high-quality translations, akin to Google's NMT system.

How It Works

The core of the project is an encoder-decoder architecture implemented with Recurrent Neural Networks (RNNs), specifically Long Short-Term Memory (LSTM) cells. The encoder processes the source sentence into a context vector, which the decoder then uses to generate the target translation. A key feature is the integration of attention mechanisms (Luong and Bahdanau styles), which allow the decoder to dynamically focus on relevant parts of the source sentence, significantly improving performance on longer sequences and complex linguistic structures. The codebase emphasizes production-ready practices and incorporates tips for optimizing speed and translation quality.

Quick Start & Requirements

  • Primary install: Clone the repository: git clone https://github.com/tensorflow/nmt/
  • Prerequisites: Requires TensorFlow Nightly. For stable TensorFlow versions, consider older branches like tf-1.4. Data download scripts are provided for IWSLT English-Vietnamese (nmt/scripts/download_iwslt15.sh) and WMT German-English (nmt/scripts/wmt16_en_de.sh).
  • Example Training Command:
    mkdir /tmp/nmt_model
    python -m nmt.nmt \
        --src=vi --tgt=en \
        --vocab_prefix=/tmp/nmt_data/vocab \
        --train_prefix=/tmp/nmt_data/train \
        --dev_prefix=/tmp/nmt_data/tst2012 \
        --test_prefix=/tmp/nmt_data/tst2013 \
        --out_dir=/tmp/nmt_model \
        --num_train_steps=12000 \
        --num_layers=2 \
        --num_units=128 \
        --dropout=0.2 \
        --metrics=bleu
    
  • Links: Google Research Blogpost, GitHub repository.

Highlighted Details

  • Attention Mechanisms: Offers implementations of Luong and Bahdanau attention, integrated via AttentionWrapper for easy adoption.
  • Advanced Features: Supports bidirectional RNNs for encoders, beam search for improved inference, and multi-GPU training strategies, including GNMT attention for enhanced parallelism.
  • Data Handling: Details the use of tf.data iterators for efficient input pipelines, including batching, padding, and bucketing of variable-length sequences.
  • Benchmarks: Provides comprehensive performance benchmarks (BLEU scores) and training speed comparisons on IWSLT and WMT datasets, contrasting different model configurations and decoding strategies.

Maintenance & Community

The project is presented as a tutorial with contributions from Google Research. No specific community channels (like Discord or Slack) or active maintenance roadmaps are detailed in the README.

Licensing & Compatibility

The provided README does not specify a software license. While described as "production-ready," users should verify licensing terms for commercial or closed-source integration.

Limitations & Caveats

This tutorial version explicitly requires TensorFlow Nightly; compatibility with older TensorFlow releases may necessitate using different branches. The focus is on demonstrating NMT concepts and achieving competitive results on specific benchmark datasets, rather than serving as a continuously maintained NMT library.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
14 more.

BIG-bench by google

0.1%
3k
Collaborative benchmark for probing and extrapolating LLM capabilities
Created 4 years ago
Updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

text-to-text-transfer-transformer by google-research

0.1%
6k
Unified text-to-text transformer for NLP research
Created 6 years ago
Updated 5 months ago
Feedback? Help us improve.