NextFlow by ByteVisionLab

Unified multimodal AI for generation and understanding

Created 3 months ago

320 stars

Top 85.0% on SourcePulse

Project Summary

NextFlow addresses the fragmentation in multimodal AI by offering a unified decoder-only autoregressive transformer for understanding, generation, and editing. Targeting researchers and power users, it enables high-fidelity multimodal output and complex reasoning within a single, efficient architecture, eliminating the need for separate diffusion or LLM backbones.

How It Works

NextFlow employs a decoder-only transformer architecture, initialized from Qwen2.5-VL-7B, trained on 6 trillion interleaved text-image tokens. Its core innovations include a Unified Tokenizer, Scale Reweighting, and Self-Correction with Residual Features for stable training. A novel hierarchical prediction paradigm and Reinforcement Learning via Group Reward Policy Optimization (GRPO) enable efficient, high-quality generation and advanced capabilities like Chain-of-Thought reasoning and in-context editing.

Quick Start & Requirements

Prerequisites: Requires initialization from the Qwen2.5-VL-7B model. Inference likely necessitates significant GPU resources and CUDA support.
Setup: Specific installation and execution commands are not detailed.
Resources: Training utilized 6 trillion tokens; inference efficiency is highlighted (1024x1024 in 5s, 6x fewer FLOPs than MMDiT).
Links: Papers available via arXiv (2601.02204, 2601.02256); a demo is mentioned but not linked.

Highlighted Details

Performance: Achieves state-of-the-art scores on DPG (88.32) and ImgEdit (4.49) benchmarks, matching specialized diffusion models in quality.
Efficiency: Generates 1024x1024 images in 5 seconds and requires 6x fewer FLOPs than MMDiT-based diffusion models.
Capabilities: Supports native Chain-of-Thought reasoning, in-context editing, interleaved generation, and dynamic resolution generation without re-encoding overhead.
Benchmark: Introduces EditCanvas, a new benchmark for evaluating editing and subject-driven generation tasks.

Maintenance & Community

No specific details regarding contributors, community channels (Discord, Slack), roadmap, or sponsorships are provided in the README.

Licensing & Compatibility

The README does not specify a software license. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The README does not explicitly state limitations, alpha status, or known bugs. The provided arXiv paper dates (2026) suggest the project may be future work or not yet publicly released in a stable form.

NextFlow by ByteVisionLab

Explore Similar Projects

ShareGPT-4o-Image by FreedomIntelligence

piFlow by Lakonik

diffusion-self-distillation by primecai

Lumina-mGPT-2.0 by Alpha-VLLM

GLM-Image by zai-org

kandinsky-5 by kandinskylab

stable-diffusion-pytorch by kjsman

RPG-DiffusionMaster by YangLing0818

HunyuanImage-3.0 by Tencent-Hunyuan

IF by deep-floyd

latent-diffusion by CompVis

Janus by deepseek-ai