Step3 by stepfun-ai

Multimodal reasoning model for efficient vision-language tasks

Created 4 months ago

439 stars

Top 67.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Project Summary

Step3 is a 321B parameter multimodal reasoning model designed for efficient vision-language tasks. It targets researchers and developers seeking high performance with reduced decoding costs, leveraging a novel Mixture-of-Experts architecture and co-designed attention mechanisms.

How It Works

Step3 employs a Mixture-of-Experts (MoE) architecture with 48 experts, activating 3 per token, resulting in 38B active parameters out of 321B total. It utilizes Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD) to minimize decoding costs and enhance efficiency across various accelerators. This co-design approach aims for top-tier performance in vision-language reasoning.

Quick Start & Requirements

Model checkpoints are available on Huggingface in bf16 and block-fp8 formats. Recommended inference engines include vLLM and SGLang. Deployment and request examples are provided in the Model Deployment Guide.

Highlighted Details

321B total parameters, 38B active parameters per token.
Features Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD).
Max context length of 65536.
Uses Deepseek V3 tokenizer.

Maintenance & Community

Contact is available via email at contact@stepfun.com. The project cites a technical report and blog post.

Licensing & Compatibility

Both code and model weights are released under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The model is presented with a technical report introduction, suggesting it may be in an early stage of public release or documentation. Specific hardware requirements for optimal performance are not detailed beyond mentioning compatibility with "low-end accelerators."

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days