Qwen3-VL-Embedding by QwenLM

State-of-the-art multimodal embedding and reranking for information retrieval

Created 6 months ago

1,321 stars

Top 29.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jason Huggins

Creator of Selenium

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Qwen3-VL-Embedding and Qwen3-VL-Reranker provide state-of-the-art multimodal embedding and reranking, built on Qwen3-VL. They enable advanced information retrieval and cross-modal understanding by processing text, images, screenshots, and videos within a unified framework. Offering a shared representation space and precise reranking, these models enhance retrieval accuracy across over 30 languages.

How It Works

The Embedding model uses a dual-tower architecture to map diverse inputs into a high-dimensional semantic vector, suitable for efficient, large-scale retrieval. The Reranking model employs a single-tower architecture with Cross-Attention for deep inter-modal fusion, precisely scoring relevance for query-document pairs to refine initial recall. This tandem approach optimizes both recall and precision.

Quick Start & Requirements

Installation involves cloning the repository and running scripts/setup_environment.sh for dependency setup. Models are available on Hugging Face and ModelScope. Usage examples cover standard Transformers and vLLM integration. Specific hardware requirements (GPU, VRAM) are not detailed, though model sizes (2B, 8B) suggest significant needs. Flash Attention 2 is recommended for acceleration.

Highlighted Details

Multimodal Versatility: Handles text, images, screenshots, and video inputs for tasks like image-text retrieval and VQA.
Unified Representation: Generates semantically rich vectors in a shared space for cross-modal similarity.
High-Precision Reranking: Precisely scores relevance for arbitrary single or mixed-modal query-document pairs.
Global Applicability: Supports over 30 languages, with customizable instructions and flexible vector dimensions (MRL).
Efficiency: Quantization support is available for deployment.

Maintenance & Community

No specific community channels or detailed maintenance information are provided. The project appears research-driven, with authors listed in the citation.

Licensing & Compatibility

The README does not specify the software license. This lack of clarity is a significant adoption blocker, leaving terms for commercial use or integration with closed-source projects undefined.

Limitations & Caveats

Detailed Transformers usage examples for the Reranker are marked "Coming soon." Specific hardware requirements and comprehensive benchmarks beyond provided tables are not elaborated. The absence of a specified license is a critical caveat.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

41 stars in the last 30 days