CC-Pan by JJLibra

Efficient diffusion model for pan-sharpening

Created 5 months ago

261 stars

Top 97.2% on SourcePulse

Project Summary

Summary

CC-Pan addresses the challenge of pan-sharpening by providing an efficient diffusion-based framework. It enables the generation of high-resolution multispectral (HRMS) imagery from panchromatic (PAN) and low-resolution multispectral (LRMS) pairs. The project offers a novel approach for researchers and engineers in remote sensing, delivering improved image quality with enhanced computational efficiency.

How It Works

The framework employs a two-stage training process: initial 1-channel Band-VAE pretraining, followed by latent diffusion model fine-tuning with a lightweight dual-branch adapter. CC-Pan compresses multispectral channels into a compact latent space, leveraging a Stable Diffusion base model. This approach allows for efficient reconstruction of HRMS imagery, offering a novel integration of diffusion models with channel compression and adapter mechanisms for pan-sharpening tasks.

Quick Start & Requirements

Installation involves cloning the repository, setting up a Python 3.10 Conda environment, and installing local dependencies, including a modified diffusers package and project requirements. Users must download Stable Diffusion base models (e.g., v1-5), CC-Pan VAE, and adapter checkpoints locally. PanCollection-style H5 datasets are also required. GPU acceleration is recommended, with xformers installation advised for improved memory efficiency. Accelerate configuration is necessary for distributed training or multi-GPU runs.

Highlighted Details

Achieves state-of-the-art quantitative results on WV3, QB, and GF2 datasets, outperforming recent diffusion-based methods in key metrics like SAM and ERGAS.
Demonstrates significant inference efficiency, with notably lower latency (3.36s) and fewer function evaluations (NFE=20) compared to other diffusion models like SGDiff on an RTX 4090.
Utilizes a custom dual-branch adapter integrated with a fine-tuned Stable Diffusion base model for effective channel compression and reconstruction.
Provides a two-stage training pipeline: Band-VAE pretraining and latent diffusion + adapter tuning.

Maintenance & Community

The provided README does not detail specific community channels (e.g., Discord, Slack) or extensive maintenance plans. Contributions are welcomed for documentation, setup clarifications, and reproducibility improvements.

Licensing & Compatibility

This project is released under the MIT License, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

Users must manage local downloads and storage for Stable Diffusion base models, project checkpoints, and datasets, which are not included in the repository. The pipeline expects data in a specific PanCollection-style H5 format, requiring users to prepare their datasets accordingly. Configuration involves correctly pointing YAML files to these local asset paths.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days