Step3  by stepfun-ai

Multimodal reasoning model for efficient vision-language tasks

Created 4 months ago
439 stars

Top 67.9% on SourcePulse

GitHubView on GitHub
Project Summary

Step3 is a 321B parameter multimodal reasoning model designed for efficient vision-language tasks. It targets researchers and developers seeking high performance with reduced decoding costs, leveraging a novel Mixture-of-Experts architecture and co-designed attention mechanisms.

How It Works

Step3 employs a Mixture-of-Experts (MoE) architecture with 48 experts, activating 3 per token, resulting in 38B active parameters out of 321B total. It utilizes Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD) to minimize decoding costs and enhance efficiency across various accelerators. This co-design approach aims for top-tier performance in vision-language reasoning.

Quick Start & Requirements

Model checkpoints are available on Huggingface in bf16 and block-fp8 formats. Recommended inference engines include vLLM and SGLang. Deployment and request examples are provided in the Model Deployment Guide.

Highlighted Details

  • 321B total parameters, 38B active parameters per token.
  • Features Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD).
  • Max context length of 65536.
  • Uses Deepseek V3 tokenizer.

Maintenance & Community

Contact is available via email at contact@stepfun.com. The project cites a technical report and blog post.

Licensing & Compatibility

Both code and model weights are released under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The model is presented with a technical report introduction, suggesting it may be in an early stage of public release or documentation. Specific hardware requirements for optimal performance are not detailed beyond mentioning compatibility with "low-end accelerators."

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Philipp Schmid Philipp Schmid(DevRel at Google DeepMind), and
2 more.

SageAttention by thu-ml

1.4%
3k
Attention kernel for plug-and-play inference acceleration
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.