StableDiffusionReconstruction by yu-takagi

Research paper for reconstructing images from human brain activity using latent diffusion

Created 2 years ago

1,125 stars

Top 34.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

This repository provides code for reconstructing visual experiences from human brain activity using Stable Diffusion, extending prior work with advanced decoding techniques like text prompts and GANs. It targets researchers and engineers in neuroscience, computer vision, and AI interested in brain-computer interfaces and generative models. The primary benefit is enabling high-resolution visual reconstruction from neural data.

How It Works

The project leverages latent diffusion models (Stable Diffusion v1.4 and v2.0) to generate images based on decoded features extracted from fMRI data. It employs various decoding strategies, including direct feature mapping, text prompt generation via BLIP, and feature extraction from GANs (VGG19), to improve reconstruction accuracy. The approach maps brain activity to intermediate representations within the diffusion model or to descriptive text, guiding the image generation process.

Quick Start & Requirements

Installation: Requires downloading specific datasets (nsddata, nsddata_betas, nsddata_stimuli), installing Stable Diffusion v1.4/v2.0, and other dependencies like bdpy and transformers.
Prerequisites: Python, PyTorch, CUDA (implied for GPU usage), specific checkpoints (sd-v1-4.ckpt, 512-depth-ema.ckpt), and pre-trained models (VGG_ILSVRC_19_layers, bvlc_reference_caffenet_generator_ILSVRC2012_Training).
Setup: Involves downloading large datasets and models, and running multiple Python scripts for preprocessing and reconstruction.
Links: Paper, Project Page, FAQ

Highlighted Details

Reconstructs visual experiences from brain activity using Stable Diffusion.
Incorporates decoded text prompts (BLIP) and GAN features (VGG19) for improved reconstruction.
Supports decoded depth information using Stable Diffusion v2.0.
Includes evaluation metrics and scripts for comparing reconstruction methods.

Maintenance & Community

The project is associated with the CVPR 2023 paper "High-resolution image reconstruction with latent diffusion models from human brain activity" by Yu Takagi and Shinji Nishimoto. It acknowledges several key repositories it builds upon, including Stable Diffusion, BLIP, and bdpy. Contact information is provided via email.

Licensing & Compatibility

The repository itself does not explicitly state a license. However, it builds upon and requires Stable Diffusion, which is typically released under permissive licenses (e.g., CreativeML Open RAIL-M). Compatibility for commercial use would depend on the licenses of the underlying models and datasets used.

Limitations & Caveats

The setup process is complex, requiring significant data downloads and multiple environment configurations. The README notes that updating transformers might break BLIP functionality, suggesting careful environment management. The project relies heavily on specific versions of Stable Diffusion and pre-trained models.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days