Aquila2  by FlagAI-Open

LLM for base language and chat models

created 1 year ago
444 stars

Top 68.7% on sourcepulse

GitHubView on GitHub
Project Summary

The Aquila2 project provides open-source large language and chat models, including 7B, 34B, and 70B parameter variants, with a focus on strong performance across various benchmarks and long-context understanding. It is targeted at researchers and developers looking to leverage advanced LLMs for diverse applications, offering fine-tuning capabilities and efficient inference options.

How It Works

Aquila2 models are based on a Transformer architecture, with specific versions like AquilaChat2-34B-16K enhanced for long-context understanding through positional coding interpolation and supervised fine-tuning on extensive conversation datasets. The project also leverages FlagScale, a pretraining framework built on Megatron-LM, for efficient large-scale training.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt
  • Prerequisites: Python 3.10+, PyTorch 1.12+ (2.0+ recommended), Transformers 4.32+, CUDA 11.4+ (recommended for GPU/flash-attention).
  • Optional: Flash-attention for speed and memory reduction. Docker image available.
  • Resources: Inference examples provided for single and multi-GPU setups. Fine-tuning scripts for 7B and 34B models (full-parameter, LoRA, Q-LoRA) are available.
  • Links: Hugging Face, ModelScope, FlagOpen

Highlighted Details

  • Aquila2-34B v1.2 shows significant improvements on reasoning and comprehension datasets, approaching GPT-3.5 levels.
  • Long-context models (e.g., AquilaChat2-34B-16K) demonstrate leading performance among open-source options, comparable to GPT-3.5-16K.
  • Supports 4-bit quantization (BitsAndBytes, GPTQ) and AWQ for reduced memory footprint with minimal performance loss.
  • Fine-tuning scripts for full-parameter, LoRA, and Q-LoRA are provided for 7B and 34B models.

Maintenance & Community

  • Active development with recent releases of 70B models and performance updates for 34B models.
  • Community contributions are encouraged via GitHub Issues and Pull Requests. WeChat groups are available for contact.

Licensing & Compatibility

  • Project License: Apache 2.0.
  • Model Licenses: BAAI Aquila Model License Agreement for 7B/34B models, and a specific BAAI Aquila 70B Model License Agreement for 70B models. These may have restrictions on commercial use or redistribution.

Limitations & Caveats

  • A data leakage issue with GSM8K in pre-training was identified and addressed, with affected results removed.
  • The 70B models are experimental.
  • FlagScale, the pretraining framework, is in its early stages.
Health Check
Last commit

9 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
10 more.

TinyLlama by jzhang38

0.3%
9k
Tiny pretraining project for a 1.1B Llama model
created 1 year ago
updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
10 more.

qlora by artidoro

0.2%
11k
Finetuning tool for quantized LLMs
created 2 years ago
updated 1 year ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Calvin French-Owen Calvin French-Owen(Coounder of Segment), and
12 more.

StableLM by Stability-AI

0.0%
16k
Language models by Stability AI
created 2 years ago
updated 1 year ago
Feedback? Help us improve.