LongCat-Image  by meituan-longcat

Bilingual foundation model for advanced image generation and editing

Created 2 months ago
625 stars

Top 52.9% on SourcePulse

GitHubView on GitHub
Project Summary

LongCat-Image: Efficient Bilingual Image Generation Model

LongCat-Image is a pioneering open-source, bilingual (Chinese-English) foundation model for image generation. It addresses critical challenges in multilingual text rendering, photorealism, and deployment efficiency, offering a powerful yet accessible toolchain for developers and researchers. Its efficient 6B parameter design achieves performance competitive with much larger models, making advanced image generation more accessible and performant.

How It Works

This project introduces a 6B parameter foundation model designed for exceptional efficiency and performance. Its core innovation lies in its superior accuracy and stability for rendering Chinese text, a significant advantage over existing open-source models. The LongCat-Image-Edit variant demonstrates state-of-the-art image editing capabilities, achieving leading instruction-following and visual consistency. Photorealism is enhanced through a novel data strategy and training framework, supported by a comprehensive open-source ecosystem including intermediate checkpoints and full training code.

Quick Start & Requirements

Installation involves cloning the repository (git clone --single-branch --branch main https://github.com/meituan-longcat/LongCat-Image), creating a Conda environment with Python 3.10 (conda create -n longcat-image python=3.10), and installing dependencies (pip install -r requirements.txt, python setup.py develop). Models are downloadable via Huggingface CLI. Detailed training and inference instructions are available in the repository.

Highlighted Details

  • Efficiency: Achieves high performance with only 6B parameters, outperforming larger open-source models on multiple benchmarks like GenEval, DPG, and WISE.
  • Chinese Text Rendering: Demonstrates superior accuracy, stability, and industry-leading dictionary coverage for common Chinese characters, scoring highly on GlyphDraw2 and ChineseWord benchmarks.
  • Image Editing: The specialized LongCat-Image-Edit model offers state-of-the-art performance on CEdit-Bench and GEdit-Bench, outperforming models like Seedream 4.0 and Qwen-Image-Edit in human evaluations.
  • Photorealism: Enhanced through an innovative data strategy and training framework.
  • Ecosystem: Provides a complete toolchain, including training code for SFT, LoRA, DPO, and image editing.

Maintenance & Community

Developed by the Meituan LongCat Team, the project welcomes community contributions via Pull Requests for enhancements like LoRA adapters and ComfyUI/Diffusers integrations. Contact is available via email (longcat-team@meituan.com) or a WeChat Group.

Licensing & Compatibility

LongCat-Image is released under the Apache 2.0 license, which permits commercial use. However, users are advised to carefully assess accuracy, safety, and fairness, and to comply with all applicable laws and regulations.

Limitations & Caveats

The model has not been comprehensively evaluated for all downstream applications. Developers must consider potential performance variations across languages and carefully assess accuracy, safety, and fairness before deployment in sensitive scenarios. Compliance with data protection, privacy, and content safety regulations is the responsibility of the user.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
4
Star History
46 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
12 more.

IF by deep-floyd

0%
8k
Text-to-image model for photorealistic synthesis and language understanding
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.