QwQ  by QwenLM

Reasoning model for complex problem-solving, based on Qwen2.5

created 4 months ago
511 stars

Top 62.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

QwQ is a reasoning-specialized large language model series from Alibaba Cloud's Qwen team, designed for complex problem-solving tasks. It aims to outperform traditional instruction-tuned models by leveraging advanced reasoning and critical thinking, making it suitable for researchers and developers tackling challenging NLP applications.

How It Works

QwQ is built upon the Qwen2.5 architecture, specifically optimized for reasoning. It utilizes a thoughtful output generation process, often starting with "\ \n", to separate reasoning steps from the final answer. The model recommends specific sampling parameters (Temperature=0.6, TopP=0.95, TopK=40) and advises against greedy decoding to prevent repetition. For long contexts, it supports YaRN scaling, configurable via rope_scaling in config.json.

Quick Start & Requirements

  • Hugging Face Transformers: Install with pip install transformers. Requires transformers>=4.37.0.
    from transformers import AutoModelForCausalLM, AutoTokenizer
    model_name = "Qwen/QwQ-32B"
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    # ... generation code ...
    
  • Ollama: ollama run hf.co/Qwen/QwQ-32B-GGUF:Q4_K_M
  • Llama.cpp: Requires GGUF model files.
    ./llama-cli --model QwQ-32B-GGUF/qwq-32b-q4_k_m.gguf --threads 32 --ctx-size 32768 --temp 0.6 --top-p 0.95 --prompt "<|im_start|>user\nHow many r's are in the word \"strawberry\"<|im_end|>\n<|im_start|>assistant\n \n"
    
  • API: Alibaba Cloud Model Studio API.

Highlighted Details

  • QwQ-32B competes with top-tier reasoning models like DeepSeek-R1 and o1-mini.
  • Supports YaRN for long context handling (e.g., 8192+ tokens) with specific configuration.
  • Provides detailed usage guidelines for optimal performance, including prompt standardization for math and multiple-choice questions.
  • Offers GGUF versions for local inference via Ollama and Llama.cpp.

Maintenance & Community

  • Developed by the Qwen team at Alibaba Cloud.
  • Community links: Hugging Face, ModelScope, Blog, Demo, WeChat, Discord.
  • API service available via Alibaba Cloud Model Studio.

Licensing & Compatibility

  • License details are not explicitly stated in the README, but usage is governed by Usage Guidelines.
  • Compatibility for commercial use is not specified.

Limitations & Caveats

  • Users encountering performance issues or endless repetitions should consult the Usage Guidelines.
  • vLLM's static YaRN implementation may impact performance on shorter texts.
  • The README mentions a potential KeyError: 'qwen2' with transformers<4.37.0.
Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
32 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Julien Chaumond Julien Chaumond(Cofounder of Hugging Face), and
1 more.

question_generation by patil-suraj

0%
1k
Question generation study using transformers
created 5 years ago
updated 1 year ago
Feedback? Help us improve.