LLaMA-Pro by TencentARC

LLM research paper on progressive LLaMA via block expansion

created 1 year ago

507 stars

Top 62.3% on sourcepulse

Project Summary

LLaMA-Pro introduces a novel "block expansion" technique for progressive model enhancement, targeting researchers and developers seeking to improve large language model performance, particularly in code and math-related tasks. This method allows for efficient scaling and specialization of existing LLaMA architectures.

How It Works

LLaMA-Pro employs a progressive training strategy that expands model blocks, enabling efficient adaptation and performance gains without full retraining. This approach is advantageous for enhancing specific capabilities, such as mathematical reasoning and code generation, by integrating specialized knowledge into the model's architecture.

Quick Start & Requirements

Install: Code is available on GitHub. Model checkpoints are hosted on Hugging Face.
Prerequisites: Python, PyTorch. Specific hardware requirements (e.g., GPU, VRAM) are not detailed but are implied for running LLMs.
Resources: Links to a demo and model news are provided. The paper is available via ACL 2024.

Highlighted Details

Achieves state-of-the-art performance on GSM8k (78.4 Pass@1) and MATH (30.3 Pass@1) benchmarks with MetaMath-Mistral-Pro.
Introduces Mistral-Pro-8B-v0.1, matching Gemma's performance and enhancing Mistral's code and math capabilities.
Training code is derived from the open-instruct repository.

Maintenance & Community

The project is associated with TencentARC and has been accepted to ACL 2024.
Checkpoints are hosted by Hugging Face and Wisemodel.

Licensing & Compatibility

The README does not explicitly state a license. Given the association with LLaMA and the use of open-instruct, it's likely to follow similar permissive or research-oriented licenses, but this requires verification.

Limitations & Caveats

Specific hardware requirements for running the models are not detailed in the README. The project is presented as research code, implying potential for instability or incomplete features.

Health Check

Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 90 days