ALMA  by fe1ixxu

LLM translator for many-to-many language translation

created 1 year ago
547 stars

Top 59.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

ALMA is a suite of LLM-based translation models, offering three generations (ALMA, ALMA-R, X-ALMA) that progressively enhance translation quality and language coverage. It targets researchers and practitioners seeking state-of-the-art machine translation capabilities, providing significant improvements over existing models, including GPT-4 and WMT winners.

How It Works

ALMA employs a two-step fine-tuning process: initial fine-tuning on monolingual data followed by optimization on high-quality parallel data. ALMA-R further refines models using Contrastive Preference Optimization (CPO) with triplet preference data. X-ALMA extends this to 50 languages via a plug-and-play language-specific module architecture and a 5-step training recipe incorporating Adaptive-Rejection Preference Optimization. This modular approach and advanced optimization techniques enable broad language support and high performance.

Quick Start & Requirements

  • Installation: Primarily through Hugging Face transformers library. Example usage provided for X-ALMA with transformers and peft.
  • Prerequisites: Python 3.11+, PyTorch. Supports AMD and Nvidia GPUs. CUDA 12 is not explicitly required but recommended for Nvidia.
  • Resources: Requires significant GPU memory, especially for the third X-ALMA loading method. Training requires substantial data and compute.
  • Links:

Highlighted Details

  • X-ALMA supports 50 languages and 98 directions.
  • ALMA-R matches or exceeds GPT-4 and WMT winners.
  • Supports various base models including LLaMA, OPT, Falcon, BLOOM, and MPT.
  • Offers LoRA fine-tuning options.

Maintenance & Community

The project has seen recent activity with X-ALMA's release and acceptance at ICLR 2025. CPO has been merged into Hugging Face's trl library. No specific community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The models are hosted on Hugging Face, implying compatibility with the Hugging Face ecosystem. Commercial use implications are not detailed.

Limitations & Caveats

The README does not detail specific limitations or known bugs. Training X-ALMA is described as complex due to the need for numerous intermediate checkpoints. The third loading method for X-ALMA requires substantial GPU memory.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.