Step3  by stepfun-ai

Multimodal reasoning model for efficient vision-language tasks

created 3 weeks ago

New!

390 stars

Top 73.5% on SourcePulse

GitHubView on GitHub
Project Summary

Step3 is a 321B parameter multimodal reasoning model designed for efficient vision-language tasks. It targets researchers and developers seeking high performance with reduced decoding costs, leveraging a novel Mixture-of-Experts architecture and co-designed attention mechanisms.

How It Works

Step3 employs a Mixture-of-Experts (MoE) architecture with 48 experts, activating 3 per token, resulting in 38B active parameters out of 321B total. It utilizes Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD) to minimize decoding costs and enhance efficiency across various accelerators. This co-design approach aims for top-tier performance in vision-language reasoning.

Quick Start & Requirements

Model checkpoints are available on Huggingface in bf16 and block-fp8 formats. Recommended inference engines include vLLM and SGLang. Deployment and request examples are provided in the Model Deployment Guide.

Highlighted Details

  • 321B total parameters, 38B active parameters per token.
  • Features Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD).
  • Max context length of 65536.
  • Uses Deepseek V3 tokenizer.

Maintenance & Community

Contact is available via email at contact@stepfun.com. The project cites a technical report and blog post.

Licensing & Compatibility

Both code and model weights are released under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The model is presented with a technical report introduction, suggesting it may be in an early stage of public release or documentation. Specific hardware requirements for optimal performance are not detailed beyond mentioning compatibility with "low-end accelerators."

Health Check
Last commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
13
Star History
392 stars in the last 23 days

Explore Similar Projects

Starred by Lianmin Zheng Lianmin Zheng(Author of SGLang), Shizhe Diao Shizhe Diao(Research Scientist at NVIDIA; Author of LMFlow), and
3 more.

Kimi-K2 by MoonshotAI

1.7%
8k
State-of-the-art MoE language model
created 1 month ago
updated 2 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
17 more.

TinyLlama by jzhang38

0.1%
9k
Tiny pretraining project for a 1.1B Llama model
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Feedback? Help us improve.