dots.llm1 by rednote-hilab

MoE model for research

Created 8 months ago

476 stars

Top 64.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Project Summary

The dots.llm1 project provides a large-scale Mixture-of-Experts (MoE) language model, designed for researchers and developers seeking high-performance LLMs trained on quality data without synthetic augmentation. It offers intermediate checkpoints and efficient inference capabilities, aiming to match state-of-the-art performance with a reduced active parameter count.

How It Works

dots.llm1 is a 142B total parameter MoE model that activates 14B parameters per inference. It features a novel MoE architecture with 128 experts (126 fine-grained, 2 shared) and a top-6 routing mechanism. The model incorporates QK-Norm in its attention layers and is trained on a meticulously crafted, three-stage data processing pipeline using exclusively non-synthetic data. This approach aims for enhanced performance and computational efficiency.

Quick Start & Requirements

Docker: docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host rednotehilab/dots1:vllm-openai-v0.9.0.1 --model rednote-hilab/dots.llm1.inst --tensor-parallel-size 8 --trust-remote-code --served-model-name dots1
Prerequisites: GPU (tensor-parallel-size 8 recommended for vLLM), Docker, Hugging Face Transformers, vLLM, or SGLang.
Resources: Requires significant GPU memory for 14B active parameters.
Links: Hugging Face Collection, Docker Hub, vLLM PR, SGLang PR.

Highlighted Details

14B activated parameters out of 142B total, comparable to Qwen2.5-72B.
Trained on high-quality, non-synthetic data using a three-stage processing pipeline.
Supports 32,768 token context length.
Includes intermediate training checkpoints for research into LLM learning dynamics.

Maintenance & Community

The project released its dots.llm1 series in June 2025. Further details are available in their technical report. Community interaction channels are listed as WeChat.

Licensing & Compatibility

License: MIT.
Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is relatively new, with a technical report released in June 2025. While aiming for state-of-the-art performance, specific benchmarks and real-world performance metrics beyond the report are not detailed in the README. Integration with Hugging Face Transformers is pending a PR.

Health Check

Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days