dots.llm1  by rednote-hilab

MoE model for research

created 2 months ago
438 stars

Top 69.2% on sourcepulse

GitHubView on GitHub
Project Summary

The dots.llm1 project provides a large-scale Mixture-of-Experts (MoE) language model, designed for researchers and developers seeking high-performance LLMs trained on quality data without synthetic augmentation. It offers intermediate checkpoints and efficient inference capabilities, aiming to match state-of-the-art performance with a reduced active parameter count.

How It Works

dots.llm1 is a 142B total parameter MoE model that activates 14B parameters per inference. It features a novel MoE architecture with 128 experts (126 fine-grained, 2 shared) and a top-6 routing mechanism. The model incorporates QK-Norm in its attention layers and is trained on a meticulously crafted, three-stage data processing pipeline using exclusively non-synthetic data. This approach aims for enhanced performance and computational efficiency.

Quick Start & Requirements

  • Docker: docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host rednotehilab/dots1:vllm-openai-v0.9.0.1 --model rednote-hilab/dots.llm1.inst --tensor-parallel-size 8 --trust-remote-code --served-model-name dots1
  • Prerequisites: GPU (tensor-parallel-size 8 recommended for vLLM), Docker, Hugging Face Transformers, vLLM, or SGLang.
  • Resources: Requires significant GPU memory for 14B active parameters.
  • Links: Hugging Face Collection, Docker Hub, vLLM PR, SGLang PR.

Highlighted Details

  • 14B activated parameters out of 142B total, comparable to Qwen2.5-72B.
  • Trained on high-quality, non-synthetic data using a three-stage processing pipeline.
  • Supports 32,768 token context length.
  • Includes intermediate training checkpoints for research into LLM learning dynamics.

Maintenance & Community

The project released its dots.llm1 series in June 2025. Further details are available in their technical report. Community interaction channels are listed as WeChat.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is relatively new, with a technical report released in June 2025. While aiming for state-of-the-art performance, specific benchmarks and real-world performance metrics beyond the report are not detailed in the README. Integration with Hugging Face Transformers is pending a PR.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
446 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-V2 by deepseek-ai

0.1%
5k
MoE language model for research/API use
created 1 year ago
updated 10 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
10 more.

TinyLlama by jzhang38

0.3%
9k
Tiny pretraining project for a 1.1B Llama model
created 1 year ago
updated 1 year ago
Feedback? Help us improve.