Hunyuan-A13B by Tencent-Hunyuan

LLM with fine-grained MoE architecture

Created 6 months ago

807 stars

Top 43.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

Hunyuan-A13B is an open-source large language model from Tencent, built on a fine-grained Mixture-of-Experts (MoE) architecture. It offers a balance of high performance and computational efficiency, making it suitable for advanced reasoning and general-purpose applications, particularly in resource-constrained environments.

How It Works

The model features an 80 billion parameter total with 13 billion active parameters, leveraging MoE for efficiency. It supports hybrid reasoning (fast/slow thinking), a 256K context window, and is optimized for agent tasks. Grouped Query Attention (GQA) and multiple quantization formats (FP8, INT4) are employed for efficient inference.

Quick Start & Requirements

Installation: Primarily via Hugging Face transformers library or pre-built Docker images for TensorRT-LLM, vLLM, and SGLang.
Prerequisites: transformers library, PyTorch. Docker images require NVIDIA Container Toolkit and CUDA 12.8 for vLLM. TensorRT-LLM deployment requires specific configuration files.
Resources: Model weights are substantial; quantization significantly reduces requirements.
Links: Hugging Face, ModelScope, TensorRT-LLM Docker, vLLM Docker, SGLang Docker.

Highlighted Details

Achieves competitive performance across benchmarks like MMLU, GSM8k, and agent-specific tasks.
Offers FP8 and INT4 quantized versions for reduced memory footprint and faster inference.
Supports flexible reasoning modes (fast/slow thinking) via prompt engineering or API parameters.
Natively handles a 256K context window.

Maintenance & Community

Open-sourced by Tencent.
Contact available via email (hunyuan_opensource@tencent.com) and an open-source team contact.

Licensing & Compatibility

License details are not explicitly stated in the provided README snippet, but the open-source nature suggests permissive usage. Commercial use compatibility should be verified.

Limitations & Caveats

The README mentions specific CUDA versions for certain Docker deployments (e.g., CUDA 12.8 for vLLM), implying potential compatibility constraints. Detailed performance benchmarks are provided, but direct comparisons to all relevant models may be limited.

Health Check

Last Commit

6 months ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days