ParScale  by QwenLM

Research paper introducing parallel scaling for language models

Created 8 months ago
466 stars

Top 65.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository introduces "Parallel Scaling" (ParScale), a novel paradigm for scaling Large Language Models (LLMs) that complements parameter and inference time scaling. It targets researchers and practitioners seeking to improve LLM performance and efficiency, offering a way to achieve logarithmic gains in capability with significantly reduced resource overhead.

How It Works

ParScale applies $P$ diverse, learnable transformations to the input, processing them in parallel through the LLM. The outputs are then dynamically aggregated. This approach theoretically and empirically demonstrates a logarithmic scaling law ($O(\log P)$) with the number of parallel streams, suggesting it's an efficient substitute for parameter growth, particularly beneficial for reasoning-intensive tasks.

Quick Start & Requirements

  • Install: pip install . after cloning the llm-analysis repository for cost analysis.
  • Prerequisites: CUDA, Python. Models are available on Hugging Face.
  • Resources: Requires GPU for inference.
  • Links: Hugging Face Models, Paper

Highlighted Details

  • Achieves $O(\log P)$ scaling, comparable to parameter scaling.
  • Universal applicability across model architectures, tasks, and data.
  • Demonstrates superior inference efficiency: up to 22x less memory and 6x less latency increase than parameter scaling for equivalent performance gains (batch size=1).
  • Enables cost-efficient training via a two-stage strategy and dynamic adaptation at inference time with frozen parameters.

Maintenance & Community

The project is associated with authors from institutions like Tsinghua University. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify any limitations or caveats regarding unsupported platforms, known bugs, or alpha status. The "trust_remote_code=True" requirement for Hugging Face model loading implies potential security considerations.

Health Check
Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
20 more.

alpa by alpa-projects

0.0%
3k
Auto-parallelization framework for large-scale neural network training and serving
Created 4 years ago
Updated 2 years ago
Feedback? Help us improve.