LLM for research, focused on Chinese language understanding and generation
Top 74.3% on sourcepulse
HIT-SCIR/huozi provides the Huozi series of large language models, designed for research and practical applications in natural language processing. The latest version, Huozi 3.5, offers enhanced performance in Chinese and English knowledge, mathematical reasoning, code generation, and instruction following, targeting researchers and developers working with LLMs.
How It Works
Huozi 3.5 is a Sparse Mixture-of-Experts (SMoE) model with 46.7B total parameters, activating only 13B during inference for efficiency. Its development involved several stages: incremental pre-training on a Chinese vocabulary extension for Mixtral-8x7B, instruction fine-tuning to create Huozi 3.0, further fine-tuning with a proprietary dataset and BPE Dropout for instruction following, model fusion, and final instruction fine-tuning for Huozi 3.5. This multi-stage approach aims to balance broad knowledge with strong task-specific capabilities and safety.
Quick Start & Requirements
transformers
or modelscope
libraries. Example Python code provided for inference.transformers
, modelscope
, vLLM
(for accelerated inference). GPU with sufficient VRAM (e.g., 88GB for full model, less for quantized versions) is recommended. CUDA support is beneficial.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
10 months ago
1 week