Qwen by QwenLM

Chat & pretrained LLM by Alibaba Cloud

Created 2 years ago

20,116 stars

Top 2.2% on SourcePulse

View on GitHub

17 Experts Love This Project

Author of LMFlow; Research Scientist at NVIDIA

and 13 more!

Project Summary

Qwen provides a suite of large language models (LLMs) and chat models, including base models (Qwen, Qwen-1.8B, Qwen-7B, Qwen-14B, Qwen-72B) and their chat-tuned variants. Developed by Alibaba Cloud, these models are designed for a wide range of natural language processing tasks, from content creation and summarization to tool usage and agentic behavior, targeting researchers and developers.

How It Works

The Qwen models are pretrained on extensive multilingual datasets (up to 3 trillion tokens), focusing on Chinese and English across various domains. They employ techniques to support long context windows (up to 32K tokens) and offer various quantization methods (Int4, Int8, KV cache quantization) for improved efficiency. The chat models are further aligned with human preferences using SFT and RLHF, enabling conversational capabilities and tool integration.

Quick Start & Requirements

Installation: pip install -r requirements.txt
Prerequisites: Python 3.8+, PyTorch 1.12+, Transformers 4.32+. CUDA 11.4+ recommended for GPU usage. Optional: flash-attention for performance.
Usage: Examples provided for Hugging Face Transformers and ModelScope integration. Docker images are available for simplified deployment.
Resources: Links to Hugging Face, ModelScope, and a technical report are provided.

Highlighted Details

Offers models ranging from 1.8B to 72B parameters, with competitive benchmark performance against models like LLaMA2 and GPT-3.5.
Supports advanced features like system prompts for customization, tool usage, and function calling.
Provides detailed finetuning guides (full-parameter, LoRA, Q-LoRA) and deployment options (vLLM, FastChat, local API).
Includes quantization techniques (GPTQ, KV cache quantization) and performance benchmarks for speed and memory.

Maintenance & Community

The repository QwenLM/Qwen is noted as no longer actively maintained due to codebase differences with newer versions.
Community channels include Discord and WeChat. Contact email: qianwen_opensource@alibabacloud.com.

Licensing & Compatibility

Source code is under Apache 2.0 License.
Model weights for 7B, 14B, and 72B require application for commercial use via DashScope. Qwen-1.8B is under a RESEARCH LICENSE AGREEMENT, requiring contact for commercial use.

Limitations & Caveats

The primary repository QwenLM/Qwen is not actively maintained; users should refer to QwenLM/Qwen2.
Some quantization packages (e.g., auto-gptq) may have version compatibility issues with transformers and optimum.
KV cache quantization and flash attention cannot be used simultaneously.
Manual copying of certain non-Python files (.cpp, .cu) might be necessary for specific functionalities.