Lance  by bytedance

Unified multimodal AI for image and video tasks

Created 6 days ago

New!

538 stars

Top 58.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Lance is a 3B-parameter native unified multimodal model designed for comprehensive image and video understanding, generation, and editing within a single framework. It targets researchers and developers seeking an efficient, single-model solution for diverse visual AI tasks, offering strong performance at a relatively small scale.

How It Works

Lance employs a staged multi-task training recipe, building its transformer backbone entirely from scratch. This approach, combined with only 3 billion active parameters, enables efficient operation while achieving competitive results across various benchmarks. The model integrates image and video modalities, allowing for seamless transitions between understanding, generation, and editing tasks.

Quick Start & Requirements

  • Primary Install/Run: Installation via ./setup_env.sh. Model weights must be downloaded from Hugging Face and placed in the downloads/ directory. Inference is managed through inference_lance.sh or a Gradio interface (python lance_gradio_t2v_v2t.py).
  • Prerequisites: Python 3.10+, CUDA 12.4+ (required).
  • Hardware: A GPU with at least 40GB VRAM is required for inference.
  • Links: Model weights available on Hugging Face. Example configurations are provided in config/examples/.

Highlighted Details

  • Achieves strong performance on image generation, editing, and video generation benchmarks, often competitive with or surpassing larger models.
  • Demonstrates high scores on GenEVAL, DPG, GEdit, and VBench evaluations, particularly notable for its 3B parameter size.
  • Supports a unified command-line interface for all tasks, including text-to-video, text-to-image, image editing, video editing, and cross-modal understanding.

Maintenance & Community

Contact for questions, issues, or collaborations is provided for Mengqi Huang and Jianzhu Guo. No specific community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The project is licensed under "Copyright 2025 Bytedance Ltd. and/or its affiliates." This is not a standard open-source license and may impose significant restrictions on commercial use, distribution, and integration into closed-source projects. Clarification on usage rights is strongly recommended before adoption.

Limitations & Caveats

The primary adoption blocker is the non-standard copyright notice, which lacks the clear permissions typically associated with open-source software and requires explicit clarification for any use beyond personal research. Specific limitations regarding unsupported platforms or known bugs are not detailed in the README.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
7
Issues (30d)
6
Star History
541 stars in the last 6 days

Explore Similar Projects

Feedback? Help us improve.