Discover and explore top open-source AI tools and projects—updated daily.
AMD-AGIHigh-performance open language models
Top 87.6% on SourcePulse
Summary
Instella provides a family of fully open, high-performance language models developed by the AMD GenAI team. These models aim to outperform existing open models of similar size and compete with leading open-weight alternatives, offering model weights, training code, and data to foster open-source AI development. They are particularly suited for researchers and engineers leveraging AMD hardware.
How It Works
Trained on AMD Instinct™ MI300X GPUs, Instella models leverage a multi-stage training process including pre-training on large datasets (OLMoE-mix-0924, dolmino-mix-1124, etc.), supervised fine-tuning (SFT), and Direct Preference Optimization (DPO). The architecture builds upon the OLMo framework, incorporating optimizations like Flash-Attention for efficient training on AMD hardware. This approach yields models that achieve competitive performance against established benchmarks.
Quick Start & Requirements
Installation involves setting up PyTorch with ROCm support (a rocm/pytorch Docker image is recommended for AMD GPUs). From the cloned repository, install Flash-Attention for MI300X GPUs and then other dependencies via pip install -e .[all]. Key prerequisites include AMD GPUs (specifically MI300X for training) and Python. Links to Hugging Face model cards (amd/Instella-3B-Instruct, etc.) and AMD GPU optimization blogs are provided.
Highlighted Details
Maintenance & Community
Developed by the AMD GenAI team. The codebase is built upon the OLMo project. No specific community channels (Discord, Slack) or roadmap links are detailed in the provided README.
Licensing & Compatibility
Instella models and the associated GSM8K synthetic dataset are released under a ResearchRAIL license. This license restricts usage strictly to academic and research purposes, rendering the models incompatible with commercial applications or closed-source integration.
Limitations & Caveats
The ResearchRAIL license imposes significant restrictions, limiting adoption to non-commercial, research-oriented use cases. The project's development and optimization focus heavily on AMD Instinct™ MI300X GPUs and the ROCm ecosystem, suggesting potential compatibility and performance challenges on other hardware architectures like NVIDIA or Intel GPUs.
2 weeks ago
Inactive
graphcore
NervanaSystems
openvinotoolkit