LLaMA-Pro  by TencentARC

LLM research paper on progressive LLaMA via block expansion

created 1 year ago
507 stars

Top 62.3% on sourcepulse

GitHubView on GitHub
Project Summary

LLaMA-Pro introduces a novel "block expansion" technique for progressive model enhancement, targeting researchers and developers seeking to improve large language model performance, particularly in code and math-related tasks. This method allows for efficient scaling and specialization of existing LLaMA architectures.

How It Works

LLaMA-Pro employs a progressive training strategy that expands model blocks, enabling efficient adaptation and performance gains without full retraining. This approach is advantageous for enhancing specific capabilities, such as mathematical reasoning and code generation, by integrating specialized knowledge into the model's architecture.

Quick Start & Requirements

  • Install: Code is available on GitHub. Model checkpoints are hosted on Hugging Face.
  • Prerequisites: Python, PyTorch. Specific hardware requirements (e.g., GPU, VRAM) are not detailed but are implied for running LLMs.
  • Resources: Links to a demo and model news are provided. The paper is available via ACL 2024.

Highlighted Details

  • Achieves state-of-the-art performance on GSM8k (78.4 Pass@1) and MATH (30.3 Pass@1) benchmarks with MetaMath-Mistral-Pro.
  • Introduces Mistral-Pro-8B-v0.1, matching Gemma's performance and enhancing Mistral's code and math capabilities.
  • Training code is derived from the open-instruct repository.

Maintenance & Community

  • The project is associated with TencentARC and has been accepted to ACL 2024.
  • Checkpoints are hosted by Hugging Face and Wisemodel.

Licensing & Compatibility

  • The README does not explicitly state a license. Given the association with LLaMA and the use of open-instruct, it's likely to follow similar permissive or research-oriented licenses, but this requires verification.

Limitations & Caveats

  • Specific hardware requirements for running the models are not detailed in the README. The project is presented as research code, implying potential for instability or incomplete features.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Woosuk Kwon Woosuk Kwon(Author of vLLM), and
11 more.

WizardLM by nlpxucan

0.1%
9k
LLMs built using Evol-Instruct for complex instruction following
created 2 years ago
updated 1 month ago
Feedback? Help us improve.