The Aquila2 project provides open-source large language and chat models, including 7B, 34B, and 70B parameter variants, with a focus on strong performance across various benchmarks and long-context understanding. It is targeted at researchers and developers looking to leverage advanced LLMs for diverse applications, offering fine-tuning capabilities and efficient inference options.
How It Works
Aquila2 models are based on a Transformer architecture, with specific versions like AquilaChat2-34B-16K enhanced for long-context understanding through positional coding interpolation and supervised fine-tuning on extensive conversation datasets. The project also leverages FlagScale, a pretraining framework built on Megatron-LM, for efficient large-scale training.
Quick Start & Requirements
- Installation:
pip install -r requirements.txt
- Prerequisites: Python 3.10+, PyTorch 1.12+ (2.0+ recommended), Transformers 4.32+, CUDA 11.4+ (recommended for GPU/flash-attention).
- Optional: Flash-attention for speed and memory reduction. Docker image available.
- Resources: Inference examples provided for single and multi-GPU setups. Fine-tuning scripts for 7B and 34B models (full-parameter, LoRA, Q-LoRA) are available.
- Links: Hugging Face, ModelScope, FlagOpen
Highlighted Details
- Aquila2-34B v1.2 shows significant improvements on reasoning and comprehension datasets, approaching GPT-3.5 levels.
- Long-context models (e.g., AquilaChat2-34B-16K) demonstrate leading performance among open-source options, comparable to GPT-3.5-16K.
- Supports 4-bit quantization (BitsAndBytes, GPTQ) and AWQ for reduced memory footprint with minimal performance loss.
- Fine-tuning scripts for full-parameter, LoRA, and Q-LoRA are provided for 7B and 34B models.
Maintenance & Community
- Active development with recent releases of 70B models and performance updates for 34B models.
- Community contributions are encouraged via GitHub Issues and Pull Requests. WeChat groups are available for contact.
Licensing & Compatibility
- Project License: Apache 2.0.
- Model Licenses: BAAI Aquila Model License Agreement for 7B/34B models, and a specific BAAI Aquila 70B Model License Agreement for 70B models. These may have restrictions on commercial use or redistribution.
Limitations & Caveats
- A data leakage issue with GSM8K in pre-training was identified and addressed, with affected results removed.
- The 70B models are experimental.
- FlagScale, the pretraining framework, is in its early stages.