Qwen-Doc by Tongyi-Zhiwen

Long-context reasoning model

Created 7 months ago

506 stars

Top 61.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Pawel Garbacki

Cofounder of Fireworks AI

Project Summary

QwenLong-L1 is a 32B parameter Large Reasoning Model (LRM) designed for robust long-context generalization. It addresses the challenge of extending LRM capabilities beyond short contexts by employing a novel reinforcement learning (RL) framework. The model is targeted at researchers and developers working with long documents requiring complex reasoning, offering performance competitive with state-of-the-art models like Claude-3.7-Sonnet-Thinking.

How It Works

The framework enhances short-context LRMs through progressive context scaling during RL training. It comprises three core components: a warm-up supervised fine-tuning (SFT) phase for policy initialization, a curriculum-guided RL phase for stable adaptation from short to long contexts, and a difficulty-aware retrospective sampling mechanism to manage training complexity. Hybrid reward functions combining rule-based and model-based rewards are used with RL algorithms like GRPO and DAPO to balance precision and recall, guiding LRMs towards effective reasoning patterns for long-context grounding.

Quick Start & Requirements

Installation: Requires Python 3.10, requirements.txt, verl, and vllm (v0.7.3). flash-attn is also recommended.
Prerequisites: CUDA, vllm for serving, and potentially ray for distributed training.
Usage: Can be loaded and run using the Hugging Face transformers library.
Long Context: Supports up to 131,072 tokens via YaRN RoPE scaling, configurable through config.json or command-line arguments in inference frameworks like vLLM and llama.cpp.
Resources: Requires significant GPU resources for inference and training.
Docs: HuggingFace, ModelScope

Highlighted Details

First LRM trained with RL for long-context reasoning.
Outperforms flagship LRMs on seven long-context DocQA benchmarks.
Achieves performance on par with Claude-3.7-Sonnet-Thinking.
Released an RL training dataset, DocQA-RL-1.6K, for reasoning tasks.

Maintenance & Community

Developed by Alibaba Tongyi Lab.
Community channels include WeChat and DingTalk (QR codes provided in README).
Citation details are available for academic reference.

Licensing & Compatibility

Licensed under Apache 2.0.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

YaRN static scaling may degrade performance on shorter texts; it's recommended to enable it only when long contexts are required and to adjust the scaling factor accordingly. Enabling YaRN for contexts not exceeding 32,768 tokens is not recommended as it may degrade performance.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

208 stars in the last 30 days