Step3  by stepfun-ai

Multimodal reasoning model for efficient vision-language tasks

Created 2 months ago
429 stars

Top 69.1% on SourcePulse

GitHubView on GitHub
Project Summary

Step3 is a 321B parameter multimodal reasoning model designed for efficient vision-language tasks. It targets researchers and developers seeking high performance with reduced decoding costs, leveraging a novel Mixture-of-Experts architecture and co-designed attention mechanisms.

How It Works

Step3 employs a Mixture-of-Experts (MoE) architecture with 48 experts, activating 3 per token, resulting in 38B active parameters out of 321B total. It utilizes Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD) to minimize decoding costs and enhance efficiency across various accelerators. This co-design approach aims for top-tier performance in vision-language reasoning.

Quick Start & Requirements

Model checkpoints are available on Huggingface in bf16 and block-fp8 formats. Recommended inference engines include vLLM and SGLang. Deployment and request examples are provided in the Model Deployment Guide.

Highlighted Details

  • 321B total parameters, 38B active parameters per token.
  • Features Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD).
  • Max context length of 65536.
  • Uses Deepseek V3 tokenizer.

Maintenance & Community

Contact is available via email at contact@stepfun.com. The project cites a technical report and blog post.

Licensing & Compatibility

Both code and model weights are released under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The model is presented with a technical report introduction, suggesting it may be in an early stage of public release or documentation. Specific hardware requirements for optimal performance are not detailed beyond mentioning compatibility with "low-end accelerators."

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Philipp Schmid Philipp Schmid(DevRel at Google DeepMind), and
2 more.

SageAttention by thu-ml

1.2%
3k
Attention kernel for plug-and-play inference acceleration
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.