Discover and explore top open-source AI tools and projects—updated daily.
stepfun-aiAutoregressive image generation with continuous tokens
Top 53.1% on SourcePulse
NextStep-1 addresses the limitations of traditional autoregressive image generation by employing continuous image tokens, preserving visual richness without relying on costly diffusion models or lossy discrete tokens. Developed for researchers and practitioners in multimodal AI, it offers a scalable and simpler framework for state-of-the-art image generation.
How It Works
This project introduces a 14B-parameter autoregressive model that jointly processes discrete text tokens and continuous image tokens. It utilizes a standard language model head for text and a lightweight 157M-parameter flow matching head for visual generation. This unified next-token prediction approach is designed for simplicity and scalability, enabling the generation of highly detailed images by directly modeling continuous visual data.
Quick Start & Requirements
Installation involves cloning the repository, creating a Conda environment with Python 3.10, and installing dependencies using uv pip install -e .. Pre-installing PyTorch based on your CUDA version is recommended. The project provides CLI tools like smartrun for distributed training and inference/inference.py for running models. Downloading model weights and datasets can be time-consuming. Links to the project page, Hugging Face, and arXiv are available.
Highlighted Details
Maintenance & Community
The project is developed by StepFun’s Multimodal Intelligence team, with recent releases of training code and post-training blogs in February 2026. A WeChat group is available for community engagement. Checkpoints are hosted on Hugging Face and ModelScope.
Licensing & Compatibility
NextStep is licensed under the Apache License 2.0, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The primary training datasets used by the NextStep team (approximately 1 billion images) are proprietary and not open-sourced; users are strongly advised to collect and prepare their own large-scale datasets. Older NextStep-1 series models are noted as less performant than the NextStep-1.1 series and are not recommended for use.
3 days ago
Inactive