Long-context reasoning model
Top 92.3% on sourcepulse
QwenLong-L1 is a 32B parameter Large Reasoning Model (LRM) designed for robust long-context generalization. It addresses the challenge of extending LRM capabilities beyond short contexts by employing a novel reinforcement learning (RL) framework. The model is targeted at researchers and developers working with long documents requiring complex reasoning, offering performance competitive with state-of-the-art models like Claude-3.7-Sonnet-Thinking.
How It Works
The framework enhances short-context LRMs through progressive context scaling during RL training. It comprises three core components: a warm-up supervised fine-tuning (SFT) phase for policy initialization, a curriculum-guided RL phase for stable adaptation from short to long contexts, and a difficulty-aware retrospective sampling mechanism to manage training complexity. Hybrid reward functions combining rule-based and model-based rewards are used with RL algorithms like GRPO and DAPO to balance precision and recall, guiding LRMs towards effective reasoning patterns for long-context grounding.
Quick Start & Requirements
requirements.txt
, verl
, and vllm
(v0.7.3). flash-attn
is also recommended.vllm
for serving, and potentially ray
for distributed training.transformers
library.config.json
or command-line arguments in inference frameworks like vLLM and llama.cpp.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
YaRN static scaling may degrade performance on shorter texts; it's recommended to enable it only when long contexts are required and to adjust the scaling factor accordingly. Enabling YaRN for contexts not exceeding 32,768 tokens is not recommended as it may degrade performance.
2 months ago
Inactive