Eurus by OpenBMB

LLM suite for reasoning, instruction-following, and chat

Created 1 year ago

320 stars

Top 84.9% on SourcePulse

Project Summary

Eurus is a suite of open-source Large Language Models (LLMs) and a reward model optimized for complex reasoning tasks. It targets researchers and developers seeking high-performance models for coding, math, and logical problem-solving, offering significant improvements over existing open-source alternatives and even surpassing GPT-3.5 Turbo in certain reasoning benchmarks.

How It Works

Eurus models are fine-tuned using the UltraInteract dataset, a novel alignment dataset designed for complex reasoning. UltraInteract structures data as preference trees, capturing step-by-step reasoning chains, multi-turn interactions with critiques, and pairwise preference data. This approach allows for both Supervised Fine-Tuning (SFT) on correct reasoning paths and Preference Learning (PL) on comparative data, leading to enhanced reasoning capabilities and instruction following.

Quick Start & Requirements

Installation: Models are available via Hugging Face Transformers.
Dependencies: Requires PyTorch and Hugging Face libraries. Specific model variants may have significant VRAM requirements (e.g., 70B models).
Resources: Access to substantial GPU resources is recommended for running larger models.
Links: Eurus Collection, UltraInteract Dataset

Highlighted Details

Eurus-70B achieves 33.3% pass@1 on LeetCode and 32.6% on TheoremQA, outperforming open-source models by over 13.3%.
Eurux-8x22B variants demonstrate strong reasoning, chat, and instruction-following capabilities.
Eurus-RM-7B shows strong preference modeling performance, outperforming GPT-4 on certain reasoning tasks.
The UltraInteract dataset includes 86k instructions, 286k correct answers, and 219k preference pairs.

Maintenance & Community

The project is associated with OpenBMB and has contributions from multiple researchers.
Updates are provided via news releases on the project page.

Licensing & Compatibility

The models are released under a permissive license (likely Apache 2.0, but requires verification for specific model weights).
Compatible with standard LLM inference frameworks.

Limitations & Caveats

While performance is strong, the specific benchmark results should be independently verified.
Running the larger 70B and 8x22B models requires significant computational resources.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days