hydra-moe  by SkunkworksAI

MoE framework for scaling language models, aiming for GPT-4 level performance

created 1 year ago
416 stars

Top 71.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Hydra-MoE introduces a novel Mixture of Experts (MoE) architecture designed to enhance open-source large language models, aiming to achieve performance comparable to state-of-the-art models like GPT-4. It targets researchers and developers seeking to scale LLM capabilities efficiently on consumer hardware by leveraging swappable QLoRA experts.

How It Works

Hydra-MoE transforms base language models into MoE frameworks using swappable QLoRA expert adapters. Architectures like Hydra-α employ k-means clustering for domain discovery, fine-tuning experts on these clusters, and dynamically swapping them at inference via similarity or classifier-based methods. Hydra-β extends this with improved gating, merging techniques (like TIES merging), and end-to-end training for gating/routing functions. This approach allows for increased model capabilities with consistent inference FLOPs, trading off memory for performance.

Quick Start & Requirements

  • Install/Run: sh setup_moe.sh for setup, python main.py --inference for inference. Docker commands are also provided.
  • Prerequisites: Requires Hugging Face API token. Specific hardware requirements for training are substantial (seeking 64x H100s/A100s).
  • Links: 🤗 HF Repo, 🐦 Twitter, ⚡ Github, 👋 Discord.

Highlighted Details

  • Aims to achieve GPT-4 level performance by scaling Llama-2 or other base models.
  • Utilizes QLoRA experts for efficient adaptation and swapping.
  • Developed multiple MoE architectures (Hydra-α, Hydra-β) with promising early results.
  • Focuses on domains including Math, Science, Reasoning, Coding, and Writing.

Maintenance & Community

The project is driven by the Skunkworks OSS community, comprising hundreds of contributors. They actively collaborate with academic and open-source groups. The project welcomes contributions via their Discord server.

Licensing & Compatibility

The README does not explicitly state a license. However, the project's emphasis on open-sourcing everything, including datasets and trained experts, suggests a permissive approach. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is in a Proof-of-Concept (PoC) stage, with ongoing training and evaluation. Significant compute resources are required for scaling experiments, and the team is actively seeking sponsors. Early results are still undergoing validation before publication.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

SWE-Gym by SWE-Gym

1.0%
513
Environment for training software engineering agents
created 9 months ago
updated 4 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.