Instella by AMD-AGI

High-performance open language models

Created 1 year ago

318 stars

Top 85.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

Summary

Instella provides a family of fully open, high-performance language models developed by the AMD GenAI team. These models aim to outperform existing open models of similar size and compete with leading open-weight alternatives, offering model weights, training code, and data to foster open-source AI development. They are particularly suited for researchers and engineers leveraging AMD hardware.

How It Works

Trained on AMD Instinct™ MI300X GPUs, Instella models leverage a multi-stage training process including pre-training on large datasets (OLMoE-mix-0924, dolmino-mix-1124, etc.), supervised fine-tuning (SFT), and Direct Preference Optimization (DPO). The architecture builds upon the OLMo framework, incorporating optimizations like Flash-Attention for efficient training on AMD hardware. This approach yields models that achieve competitive performance against established benchmarks.

Quick Start & Requirements

Installation involves setting up PyTorch with ROCm support (a rocm/pytorch Docker image is recommended for AMD GPUs). From the cloned repository, install Flash-Attention for MI300X GPUs and then other dependencies via pip install -e .[all]. Key prerequisites include AMD GPUs (specifically MI300X for training) and Python. Links to Hugging Face model cards (amd/Instella-3B-Instruct, etc.) and AMD GPU optimization blogs are provided.

Highlighted Details

"Fully Open Language Models with Stellar Performance" claims.
Outperforms similar-sized fully open models and rivals Llama-3.2-3B and Qwen2.5-3B.
Training code and datasets are fully released.
Specialized versions for vision-language (Instella-VL) and long context (Instella-Long) are available.
Optimized for AMD Instinct™ MI300X GPUs using Flash-Attention.

Maintenance & Community

Developed by the AMD GenAI team. The codebase is built upon the OLMo project. No specific community channels (Discord, Slack) or roadmap links are detailed in the provided README.

Licensing & Compatibility

Instella models and the associated GSM8K synthetic dataset are released under a ResearchRAIL license. This license restricts usage strictly to academic and research purposes, rendering the models incompatible with commercial applications or closed-source integration.

Limitations & Caveats

The ResearchRAIL license imposes significant restrictions, limiting adoption to non-commercial, research-oriented use cases. The project's development and optimization focus heavily on AMD Instinct™ MI300X GPUs and the ROCm ecosystem, suggesting potential compatibility and performance challenges on other hardware architectures like NVIDIA or Intel GPUs.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days