Aria by rhymes-ai

Multimodal MoE model for video, document understanding, and dialog

Created 1 year ago

1,084 stars

Top 35.0% on SourcePulse

View on GitHub

3 Experts Love This Project

Simon Willison

Coauthor of Django

Wing Lian

Founder of Axolotl AI

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

Aria is an open-source multimodal native Mixture-of-Experts (MoE) model designed for advanced language and vision tasks, particularly excelling in video and document understanding. It targets researchers and developers seeking state-of-the-art performance with a long context window and efficient inference.

How It Works

Aria employs a Mixture-of-Experts (MoE) architecture, activating 3.9B parameters per token for fast inference and cost-effective fine-tuning. It supports a 64K token multimodal context window, enabling comprehensive understanding of extended inputs. The model is integrated with Hugging Face Transformers for ease of use and offers compatibility with vLLM for enhanced performance.

Quick Start & Requirements

Install: pip install -e . (or pip install -e .[dev] for development). Additional installs: pip install grouped_gemm, pip install flash-attn --no-build-isolation.
Requirements: Python, PyTorch, Hugging Face Transformers. GPU with at least 80GB VRAM (e.g., A100) recommended for inference and fine-tuning. bfloat16 precision is used.
Resources: Inference requires one A100 (80GB) GPU. Fine-tuning can be done with LoRA on a single A100/H100 (80GB) or full parameter tuning on multiple A100 (80GB) GPUs.
Docs: Hugging Face, Paper, Blog, WebDemo.

Highlighted Details

State-of-the-art performance in video and document understanding.
Long multimodal context window of 64K tokens.
3.9B activated parameters per token for efficient inference.
Offers both LoRA and full parameter fine-tuning capabilities.

Maintenance & Community

Active development with recent releases including Aria-Chat and base models.
Community support via Discord.
Discord

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Fine-tuning requires specific versions of the transformers library (v4.45.0) and a particular model revision ("4844f0b5ff678e768236889df5accbe4967ec845") due to weight mapping changes. Memory requirements for fine-tuning vary significantly with dataset type.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days