QwenLong-L1  by Tongyi-Zhiwen

Long-context reasoning model

created 2 months ago
287 stars

Top 92.3% on sourcepulse

GitHubView on GitHub
Project Summary

QwenLong-L1 is a 32B parameter Large Reasoning Model (LRM) designed for robust long-context generalization. It addresses the challenge of extending LRM capabilities beyond short contexts by employing a novel reinforcement learning (RL) framework. The model is targeted at researchers and developers working with long documents requiring complex reasoning, offering performance competitive with state-of-the-art models like Claude-3.7-Sonnet-Thinking.

How It Works

The framework enhances short-context LRMs through progressive context scaling during RL training. It comprises three core components: a warm-up supervised fine-tuning (SFT) phase for policy initialization, a curriculum-guided RL phase for stable adaptation from short to long contexts, and a difficulty-aware retrospective sampling mechanism to manage training complexity. Hybrid reward functions combining rule-based and model-based rewards are used with RL algorithms like GRPO and DAPO to balance precision and recall, guiding LRMs towards effective reasoning patterns for long-context grounding.

Quick Start & Requirements

  • Installation: Requires Python 3.10, requirements.txt, verl, and vllm (v0.7.3). flash-attn is also recommended.
  • Prerequisites: CUDA, vllm for serving, and potentially ray for distributed training.
  • Usage: Can be loaded and run using the Hugging Face transformers library.
  • Long Context: Supports up to 131,072 tokens via YaRN RoPE scaling, configurable through config.json or command-line arguments in inference frameworks like vLLM and llama.cpp.
  • Resources: Requires significant GPU resources for inference and training.
  • Docs: HuggingFace, ModelScope

Highlighted Details

  • First LRM trained with RL for long-context reasoning.
  • Outperforms flagship LRMs on seven long-context DocQA benchmarks.
  • Achieves performance on par with Claude-3.7-Sonnet-Thinking.
  • Released an RL training dataset, DocQA-RL-1.6K, for reasoning tasks.

Maintenance & Community

  • Developed by Alibaba Tongyi Lab.
  • Community channels include WeChat and DingTalk (QR codes provided in README).
  • Citation details are available for academic reference.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

YaRN static scaling may degrade performance on shorter texts; it's recommended to enable it only when long contexts are required and to adjust the scaling factor accordingly. Enabling YaRN for contexts not exceeding 32,768 tokens is not recommended as it may degrade performance.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
287 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

yarn by jquesnelle

1.0%
2k
Context window extension method for LLMs (research paper, models)
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
created 1 year ago
updated 11 months ago
Feedback? Help us improve.