Qwen3Guard by QwenLM

Safety guardrails for LLM interactions

Created 2 months ago

368 stars

Top 76.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Junyang Lin

Core Maintainer at Alibaba Qwen

Project Summary

Qwen3Guard is a series of multilingual safety moderation models designed to protect against harmful content in LLM applications. Targeting developers and researchers, it offers robust prompt and response analysis, with specialized variants for real-time, token-level monitoring during text generation, providing comprehensive, multi-language safety solutions.

How It Works

Qwen3Guard is built upon Qwen3 and trained on a large safety-labeled dataset. It comprises two main types: Qwen3Guard-Gen for static classification of prompts and responses, and Qwen3Guard-Stream for real-time, token-level safety assessment during incremental generation. The models support 119 languages and classify content into three severity levels (safe, controversial, unsafe) across nine defined safety categories, enabling adaptable risk management.

Quick Start & Requirements

Installation: Requires transformers>=4.51.0. Deployment options include sglang>=0.4.6.post1 or vllm>=0.9.0 for OpenAI-compatible API endpoints.
Prerequisites: Standard Python environment. GPU recommended for inference. trust_remote_code=True is required for Qwen3Guard-Stream models.
Resources: Models range from 0.6B to 8B parameters. Specific hardware requirements depend on the chosen model size and deployment strategy.
Links:
- Hugging Face Organization: https://huggingface.co/Qwen
- ModelScope Organization: https://modelscope.cn/organization/Qwen
- Blog: https://qwenlm.github.io/blog/qwen3guard/
- Technical Report: https://github.com/QwenLM/Qwen3Guard/blob/main/Qwen3Guard_Technical_Report.pdf

Highlighted Details

Offers both prompt/response classification (-Gen) and real-time token-level streaming moderation (-Stream).
Supports 119 languages, providing broad multilingual safety coverage.
Classifies content into Safe, Controversial, and Unsafe severity levels across categories like Violent, PII, and Unethical Acts.
Claims state-of-the-art performance on safety benchmarks, with visual performance data provided.

Maintenance & Community

Developed by the Qwen team at Alibaba Cloud.
Community engagement via Discord (https://discord.gg/z3GAxXZ9Ce) and WeChat groups.

Licensing & Compatibility

The README does not explicitly state a software license. This absence poses a significant adoption blocker, requiring clarification for commercial use or integration into proprietary systems.

Limitations & Caveats

The Qwen3Guard-Stream model requires using the same tokenizer as Qwen3 for optimal performance; integration with different tokenizers necessitates re-tokenization.
Support for Qwen3Guard-Stream in vLLM and SGLang is listed as "coming soon."
The technical report and specific benchmark results are not directly embedded, requiring users to consult external documents.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

34 stars in the last 30 days