ALMA  by fe1ixxu

LLM translator for many-to-many language translation

Created 2 years ago
553 stars

Top 57.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

ALMA is a suite of LLM-based translation models, offering three generations (ALMA, ALMA-R, X-ALMA) that progressively enhance translation quality and language coverage. It targets researchers and practitioners seeking state-of-the-art machine translation capabilities, providing significant improvements over existing models, including GPT-4 and WMT winners.

How It Works

ALMA employs a two-step fine-tuning process: initial fine-tuning on monolingual data followed by optimization on high-quality parallel data. ALMA-R further refines models using Contrastive Preference Optimization (CPO) with triplet preference data. X-ALMA extends this to 50 languages via a plug-and-play language-specific module architecture and a 5-step training recipe incorporating Adaptive-Rejection Preference Optimization. This modular approach and advanced optimization techniques enable broad language support and high performance.

Quick Start & Requirements

  • Installation: Primarily through Hugging Face transformers library. Example usage provided for X-ALMA with transformers and peft.
  • Prerequisites: Python 3.11+, PyTorch. Supports AMD and Nvidia GPUs. CUDA 12 is not explicitly required but recommended for Nvidia.
  • Resources: Requires significant GPU memory, especially for the third X-ALMA loading method. Training requires substantial data and compute.
  • Links:

Highlighted Details

  • X-ALMA supports 50 languages and 98 directions.
  • ALMA-R matches or exceeds GPT-4 and WMT winners.
  • Supports various base models including LLaMA, OPT, Falcon, BLOOM, and MPT.
  • Offers LoRA fine-tuning options.

Maintenance & Community

The project has seen recent activity with X-ALMA's release and acceptance at ICLR 2025. CPO has been merged into Hugging Face's trl library. No specific community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The models are hosted on Hugging Face, implying compatibility with the Hugging Face ecosystem. Commercial use implications are not detailed.

Limitations & Caveats

The README does not detail specific limitations or known bugs. Training X-ALMA is described as complex due to the need for numerous intermediate checkpoints. The third loading method for X-ALMA requires substantial GPU memory.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
5 more.

dbrx by databricks

0%
3k
Large language model for research/commercial use
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), and
3 more.

Alpaca-CoT by PhoebusSi

0.1%
3k
IFT platform for instruction collection, parameter-efficient methods, and LLMs
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Simon Willison Simon Willison(Coauthor of Django), and
10 more.

Yi by 01-ai

0%
8k
Open-source bilingual LLMs trained from scratch
Created 1 year ago
Updated 9 months ago
Feedback? Help us improve.