abel  by GAIR-NLP

SOTA LLM for math problem solving

Created 2 years ago
333 stars

Top 82.3% on SourcePulse

GitHubView on GitHub
Project Summary

Abel is an open-source Large Language Model (LLM) focused on achieving state-of-the-art performance in mathematical reasoning without relying on external tools, reward models, or RLHF. It targets researchers and developers working on AI for STEM education and complex problem-solving, offering significant improvements over existing models on benchmarks like GSM8K and MATH.

How It Works

Abel is trained using a novel Supervised Fine-Tuning (SFT) methodology called "Parental Oversight." This approach emphasizes data processing philosophy, treating fine-tuning data like educational methods for children. It prioritizes data quality, relevance, and the inclusion of step-by-step reasoning, aiming to instill a deeper understanding rather than just memorization. This SFT-centric approach is presented as a significantly underestimated method for achieving high performance in complex reasoning tasks.

Quick Start & Requirements

  • Install: Create a conda environment (conda create -n abel python=3.10), activate it (conda activate abel), and install dependencies (pip install -r requirements.txt).
  • Evaluation: Run bash evaluation/eval.sh.
  • Prerequisites: Python 3.10, conda.
  • Resources: Evaluation may involve vLLM, with potential for slight result variations.
  • Links: Model and Leaderboard, Evaluation

Highlighted Details

  • Abel-7B-002 achieves 80.44 on GSM8K and 29.46 on MATH, outperforming other 7B models.
  • The 70B model reaches 83.62 on GSM8K and 28.26 on MATH, surpassing many proprietary models without tools.
  • Demonstrates strong robustness against out-of-distribution samples on GSM8k_robust dataset.
  • Achieves SOTA on the TAL-SCQ5K-EN dataset, outperforming MathGPT and GPT-4.

Maintenance & Community

  • Developed by GAIR Lab at Shanghai Jiao Tong University, Shanghai AI Lab.
  • Actively refining models with planned updates.
  • Issues list maintained for limitations and potential solutions.

Licensing & Compatibility

  • Abel-7B-002 is licensed under Apache License 2.0.
  • Abel-7B-001 and Abel-13B-001 are licensed under Llama 2.
  • Apache 2.0 is permissive for commercial use and closed-source linking. Llama 2 license has restrictions.

Limitations & Caveats

The model's generalization capabilities are limited to specific mathematical domains, lacking broad applicability to diverse problem types or integration into multi-domain chatbots. Multilingual support is absent, and advanced techniques like reward models and RLHF have not been explored.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
4 more.

simpleRL-reason by hkust-nlp

0.1%
4k
RL recipe for reasoning ability in models
Created 7 months ago
Updated 1 month ago
Feedback? Help us improve.