ALMA by fe1ixxu

LLM translator for many-to-many language translation

Created 2 years ago

577 stars

Top 56.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

ALMA is a suite of LLM-based translation models, offering three generations (ALMA, ALMA-R, X-ALMA) that progressively enhance translation quality and language coverage. It targets researchers and practitioners seeking state-of-the-art machine translation capabilities, providing significant improvements over existing models, including GPT-4 and WMT winners.

How It Works

ALMA employs a two-step fine-tuning process: initial fine-tuning on monolingual data followed by optimization on high-quality parallel data. ALMA-R further refines models using Contrastive Preference Optimization (CPO) with triplet preference data. X-ALMA extends this to 50 languages via a plug-and-play language-specific module architecture and a 5-step training recipe incorporating Adaptive-Rejection Preference Optimization. This modular approach and advanced optimization techniques enable broad language support and high performance.

Quick Start & Requirements

Installation: Primarily through Hugging Face transformers library. Example usage provided for X-ALMA with transformers and peft.
Prerequisites: Python 3.11+, PyTorch. Supports AMD and Nvidia GPUs. CUDA 12 is not explicitly required but recommended for Nvidia.
Resources: Requires significant GPU memory, especially for the third X-ALMA loading method. Training requires substantial data and compute.
Links:

Highlighted Details

X-ALMA supports 50 languages and 98 directions.
ALMA-R matches or exceeds GPT-4 and WMT winners.
Supports various base models including LLaMA, OPT, Falcon, BLOOM, and MPT.
Offers LoRA fine-tuning options.

Maintenance & Community

The project has seen recent activity with X-ALMA's release and acceptance at ICLR 2025. CPO has been merged into Hugging Face's trl library. No specific community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The models are hosted on Hugging Face, implying compatibility with the Hugging Face ecosystem. Commercial use implications are not detailed.

Limitations & Caveats

The README does not detail specific limitations or known bugs. Training X-ALMA is described as complex due to the need for numerous intermediate checkpoints. The third loading method for X-ALMA requires substantial GPU memory.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days