abel by GAIR-NLP

SOTA LLM for math problem solving

Created 2 years ago

333 stars

Top 82.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Binyuan Hui

Research Scientist at Alibaba Qwen

Project Summary

Abel is an open-source Large Language Model (LLM) focused on achieving state-of-the-art performance in mathematical reasoning without relying on external tools, reward models, or RLHF. It targets researchers and developers working on AI for STEM education and complex problem-solving, offering significant improvements over existing models on benchmarks like GSM8K and MATH.

How It Works

Abel is trained using a novel Supervised Fine-Tuning (SFT) methodology called "Parental Oversight." This approach emphasizes data processing philosophy, treating fine-tuning data like educational methods for children. It prioritizes data quality, relevance, and the inclusion of step-by-step reasoning, aiming to instill a deeper understanding rather than just memorization. This SFT-centric approach is presented as a significantly underestimated method for achieving high performance in complex reasoning tasks.

Quick Start & Requirements

Install: Create a conda environment (conda create -n abel python=3.10), activate it (conda activate abel), and install dependencies (pip install -r requirements.txt).
Evaluation: Run bash evaluation/eval.sh.
Prerequisites: Python 3.10, conda.
Resources: Evaluation may involve vLLM, with potential for slight result variations.
Links: Model and Leaderboard, Evaluation

Highlighted Details

Abel-7B-002 achieves 80.44 on GSM8K and 29.46 on MATH, outperforming other 7B models.
The 70B model reaches 83.62 on GSM8K and 28.26 on MATH, surpassing many proprietary models without tools.
Demonstrates strong robustness against out-of-distribution samples on GSM8k_robust dataset.
Achieves SOTA on the TAL-SCQ5K-EN dataset, outperforming MathGPT and GPT-4.

Maintenance & Community

Developed by GAIR Lab at Shanghai Jiao Tong University, Shanghai AI Lab.
Actively refining models with planned updates.
Issues list maintained for limitations and potential solutions.

Licensing & Compatibility

Abel-7B-002 is licensed under Apache License 2.0.
Abel-7B-001 and Abel-13B-001 are licensed under Llama 2.
Apache 2.0 is permissive for commercial use and closed-source linking. Llama 2 license has restrictions.

Limitations & Caveats

The model's generalization capabilities are limited to specific mathematical domains, lacking broad applicability to diverse problem types or integration into multi-domain chatbots. Multilingual support is absent, and advanced techniques like reward models and RLHF have not been explored.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days