LLM with fine-grained MoE architecture
Top 48.9% on sourcepulse
Hunyuan-A13B is an open-source large language model from Tencent, built on a fine-grained Mixture-of-Experts (MoE) architecture. It offers a balance of high performance and computational efficiency, making it suitable for advanced reasoning and general-purpose applications, particularly in resource-constrained environments.
How It Works
The model features an 80 billion parameter total with 13 billion active parameters, leveraging MoE for efficiency. It supports hybrid reasoning (fast/slow thinking), a 256K context window, and is optimized for agent tasks. Grouped Query Attention (GQA) and multiple quantization formats (FP8, INT4) are employed for efficient inference.
Quick Start & Requirements
transformers
library or pre-built Docker images for TensorRT-LLM, vLLM, and SGLang.transformers
library, PyTorch. Docker images require NVIDIA Container Toolkit and CUDA 12.8 for vLLM. TensorRT-LLM deployment requires specific configuration files.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README mentions specific CUDA versions for certain Docker deployments (e.g., CUDA 12.8 for vLLM), implying potential compatibility constraints. Detailed performance benchmarks are provided, but direct comparisons to all relevant models may be limited.
3 weeks ago
Inactive