Ling-V2  by inclusionAI

Efficient MoE LLMs for advanced reasoning and high-speed generation

Created 4 months ago
250 stars

Top 100.0% on SourcePulse

GitHubView on GitHub
Project Summary

Ling-V2 is an open-source family of Mixture-of-Experts (MoE) Large Language Models (LLMs) from InclusionAI, designed to deliver state-of-the-art performance with high computational efficiency. Targeting researchers and developers seeking powerful yet resource-conscious LLMs, Ling-V2 offers significant advantages in complex reasoning and instruction following, achieving performance comparable to much larger dense models with a fraction of activated parameters.

How It Works

Ling-V2 employs a 1/32 activation ratio MoE architecture, meticulously optimized with choices in expert granularity, shared expert ratios, attention mechanisms, and routing strategies like sigmoid routing and aux-loss-free design. This sparse activation, combined with techniques such as MTP loss, QK-Norm, and half RoPE, allows models like Ling-mini-2.0 (16B total parameters, 1.4B activated) to deliver performance equivalent to 7–8B dense models. Furthermore, the project leverages FP8 mixed-precision training, utilizing tile/blockwise FP8 scaling, FP8 optimizers, and on-demand transpose weights for extreme memory optimization and efficient training.

Quick Start & Requirements

Integration is primarily supported via Hugging Face Transformers with a provided code snippet. For users in mainland China, ModelScope is recommended. Advanced inference can be achieved using vLLM or SGLang, both requiring the cloning of their respective repositories and applying provided patches (bailing_moe_v2.patch) to their installations. Specific hardware requirements are not detailed beyond GPU mentions for performance benchmarks and inference speed examples (e.g., H20, 80G GPUs). Users should ensure compatibility with Python environments supporting these libraries. Links to model downloads (Hugging Face, ModelScope) and external libraries (vLLM, SGLang) are available within the repository.

Highlighted Details

  • Efficiency: Achieves over 7x equivalent dense performance, with Ling-mini-2.0 (1.4B activated) matching 7–8B dense models.
  • Speed: Generates at over 300 tokens/s (Ling-mini-2.0 on H20), more than 2x faster than comparable dense models.
  • Context: Supports up to 128K context length using YaRN.
  • Training: Open-sourced FP8 efficient training solution and multiple pre-training checkpoints (up to 20T tokens) are available.
  • Model Variants: Includes Ling-mini-2.0 (1.4B activated) and Ling-flash-2.0 (6.1B activated).

Maintenance & Community

The project is provided by InclusionAI. Specific details regarding community channels (e.g., Discord, Slack), active contributors, sponsorships, or a public roadmap are not detailed in the provided README.

Licensing & Compatibility

The code repository is licensed under the permissive MIT License, allowing for broad use, including commercial applications and linking with closed-source software.

Limitations & Caveats

Integration with vLLM and SGLang currently requires users to manually apply patches to the respective libraries, as these changes are not yet merged into their official releases. Support for Mixture-of-Thought (MoT) is noted as available for base models in SGLang but not yet for chat models. Specific hardware requirements beyond GPU usage for performance metrics are not explicitly detailed.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0%
476
MoE model for research
Created 8 months ago
Updated 4 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

0.9%
2k
Speculative decoding research paper for faster LLM inference
Created 2 years ago
Updated 3 weeks ago
Feedback? Help us improve.