Eurus  by OpenBMB

LLM suite for reasoning, instruction-following, and chat

created 1 year ago
320 stars

Top 86.0% on sourcepulse

GitHubView on GitHub
Project Summary

Eurus is a suite of open-source Large Language Models (LLMs) and a reward model optimized for complex reasoning tasks. It targets researchers and developers seeking high-performance models for coding, math, and logical problem-solving, offering significant improvements over existing open-source alternatives and even surpassing GPT-3.5 Turbo in certain reasoning benchmarks.

How It Works

Eurus models are fine-tuned using the UltraInteract dataset, a novel alignment dataset designed for complex reasoning. UltraInteract structures data as preference trees, capturing step-by-step reasoning chains, multi-turn interactions with critiques, and pairwise preference data. This approach allows for both Supervised Fine-Tuning (SFT) on correct reasoning paths and Preference Learning (PL) on comparative data, leading to enhanced reasoning capabilities and instruction following.

Quick Start & Requirements

  • Installation: Models are available via Hugging Face Transformers.
  • Dependencies: Requires PyTorch and Hugging Face libraries. Specific model variants may have significant VRAM requirements (e.g., 70B models).
  • Resources: Access to substantial GPU resources is recommended for running larger models.
  • Links: Eurus Collection, UltraInteract Dataset

Highlighted Details

  • Eurus-70B achieves 33.3% pass@1 on LeetCode and 32.6% on TheoremQA, outperforming open-source models by over 13.3%.
  • Eurux-8x22B variants demonstrate strong reasoning, chat, and instruction-following capabilities.
  • Eurus-RM-7B shows strong preference modeling performance, outperforming GPT-4 on certain reasoning tasks.
  • The UltraInteract dataset includes 86k instructions, 286k correct answers, and 219k preference pairs.

Maintenance & Community

  • The project is associated with OpenBMB and has contributions from multiple researchers.
  • Updates are provided via news releases on the project page.

Licensing & Compatibility

  • The models are released under a permissive license (likely Apache 2.0, but requires verification for specific model weights).
  • Compatible with standard LLM inference frameworks.

Limitations & Caveats

  • While performance is strong, the specific benchmark results should be independently verified.
  • Running the larger 70B and 8x22B models requires significant computational resources.
Health Check
Last commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Feedback? Help us improve.