hydra-moe by SkunkworksAI

MoE framework for scaling language models, aiming for GPT-4 level performance

Created 2 years ago

416 stars

Top 70.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeremy Howard

Cofounder of fast.ai

Project Summary

Hydra-MoE introduces a novel Mixture of Experts (MoE) architecture designed to enhance open-source large language models, aiming to achieve performance comparable to state-of-the-art models like GPT-4. It targets researchers and developers seeking to scale LLM capabilities efficiently on consumer hardware by leveraging swappable QLoRA experts.

How It Works

Hydra-MoE transforms base language models into MoE frameworks using swappable QLoRA expert adapters. Architectures like Hydra-α employ k-means clustering for domain discovery, fine-tuning experts on these clusters, and dynamically swapping them at inference via similarity or classifier-based methods. Hydra-β extends this with improved gating, merging techniques (like TIES merging), and end-to-end training for gating/routing functions. This approach allows for increased model capabilities with consistent inference FLOPs, trading off memory for performance.

Quick Start & Requirements

Install/Run: sh setup_moe.sh for setup, python main.py --inference for inference. Docker commands are also provided.
Prerequisites: Requires Hugging Face API token. Specific hardware requirements for training are substantial (seeking 64x H100s/A100s).
Links: 🤗 HF Repo, 🐦 Twitter, ⚡ Github, 👋 Discord.

Highlighted Details

Aims to achieve GPT-4 level performance by scaling Llama-2 or other base models.
Utilizes QLoRA experts for efficient adaptation and swapping.
Developed multiple MoE architectures (Hydra-α, Hydra-β) with promising early results.
Focuses on domains including Math, Science, Reasoning, Coding, and Writing.

Maintenance & Community

The project is driven by the Skunkworks OSS community, comprising hundreds of contributors. They actively collaborate with academic and open-source groups. The project welcomes contributions via their Discord server.

Licensing & Compatibility

The README does not explicitly state a license. However, the project's emphasis on open-sourcing everything, including datasets and trained experts, suggests a permissive approach. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is in a Proof-of-Concept (PoC) stage, with ongoing training and evaluation. Significant compute resources are required for scaling experiments, and the team is actively seeking sponsors. Early results are still undergoing validation before publication.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days