Step-3.5-Flash  by stepfun-ai

Fast, efficient agentic intelligence model

Created 3 weeks ago

New!

1,388 stars

Top 28.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Step 3.5 Flash is an open-source foundation model designed for frontier reasoning and agentic capabilities, offering exceptional efficiency. It targets engineers and researchers needing fast, reliable AI that rivals proprietary models while enabling local deployment for data privacy and agility.

How It Works

This model employs a sparse Mixture of Experts (MoE) architecture, activating only ~11B of its 196B total parameters per token. It leverages 3-way Multi-Token Prediction (MTP-3) for high generation throughput and a 3:1 Sliding Window Attention (SWA) ratio for an efficient 256K context window. This "intelligence density" approach balances deep reasoning with rapid inference.

Quick Start & Requirements

API access is available via OpenRouter or platform.stepfun.ai, requiring the openai SDK (pip install --upgrade "openai>=1.0"). For local deployment, industry-standard backends like vLLM (nightly recommended) and SGLang are supported, with installation instructions provided via Docker or pip. Hugging Face Transformers and llama.cpp are also options. Local inference requires high-end hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark), with llama.cpp needing at least 120GB VRAM for GGUF weights. Node.js (> v20) is needed for agent platform integration.

Highlighted Details

  • Achieves performance parity with leading closed-source systems, boasting 100–300 tok/s generation throughput.
  • Demonstrates strong agentic capabilities with 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0.
  • Supports an efficient 256K context window using a hybrid Sliding Window Attention mechanism.
  • Purpose-built for agentic tasks with a scalable RL framework for continuous self-improvement.

Maintenance & Community

The project actively engages its community via a Discord server (https://discord.gg/RcMJhNVAQc) for feedback and roadmap discussions. Issues can be reported via GitHub or Discord channels, influencing the project's evolving roadmap.

Licensing & Compatibility

This project is open-sourced under the Apache 2.0 License, generally permitting commercial use and integration without copyleft restrictions.

Limitations & Caveats

Currently, the model may require longer generation trajectories than some competitors for comparable quality. It can exhibit reduced stability in highly specialized domains or long-horizon dialogues, potentially leading to repetitive reasoning or inconsistencies. Full MTP-3 support is still under development for some backends like vLLM.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
26
Issues (30d)
9
Star History
1,406 stars in the last 26 days

Explore Similar Projects

Feedback? Help us improve.