MindVideo by jqin4749

Research paper for video reconstruction from brain activity

Created 2 years ago

388 stars

Top 73.9% on SourcePulse

1 Expert Loves This Project

transitive-bullshit

Founder of Agentic

Project Summary

MinD-Video is a framework for reconstructing high-quality videos from fMRI brain activity data, targeting researchers and engineers in neuroscience and AI. It enables the visualization of visual experiences directly from brain recordings, advancing the understanding of cognitive processes.

How It Works

MinD-Video employs a multi-stage approach: masked brain modeling to learn spatiotemporal patterns from fMRI data, multimodal contrastive learning with spatiotemporal attention for robust feature extraction, and co-training with a Stable Diffusion model enhanced by temporal inflation. This combination allows for high-quality, arbitrary frame rate video generation guided by adversarial principles.

Quick Start & Requirements

Install via conda env create -f env.yaml and conda activate mind-video.
Requires downloading pre-training datasets (HCP) and target datasets (Wen 2018), along with pre-trained checkpoints.
Running generation: python scripts/eval_all.py --config configs/eval_all_sub1.yaml.
Recommended: RTX3090 for 2-second, 3 FPS, 256x256 samples; higher specs needed for full frame rate (30 FPS) and resolution.
Links: arXiv, Website, Google Drive Samples

Highlighted Details

Achieves 85% accuracy in semantic classification and 0.19 SSIM, outperforming prior SOTA by 45%.
Demonstrates biological plausibility and interpretability, aligning with physiological processes.
Reconstructed videos are of high quality, capturing various objects, animals, motions, and scenes.
Can reconstruct videos at full frame rate (30 FPS) and higher resolutions with sufficient GPU memory.

Maintenance & Community

Accepted for Oral Presentation at NeurIPS 2023.
Codebase is based on Tune-A-Video.

Licensing & Compatibility

License not explicitly stated in the README.

Limitations & Caveats

Current sample generation is limited by GPU memory (RTX3090 for 2s/3FPS/256x256). Higher resolutions and frame rates require more VRAM.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

OneCAT by onecat-ai

Unified multimodal AI for understanding, generation, and editing

Created 4 months ago

Updated 3 months ago

DIVA by baaivision

Post-training method for improving CLIP models

Created 1 year ago

Updated 11 months ago

CM3Leon by kyegomez

Open-source implementation of a multimodal AI research paper

Created 2 years ago

Updated 2 years ago

GroundingGPT by lzw-lzw

Multimodal grounding model (research paper)

Created 2 years ago

Updated 1 year ago

VideoWorld by ByteDance-Seed

Generative model for knowledge learning from unlabeled videos (CVPR 2025 paper)

Created 1 year ago

Updated 5 months ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

fMRI-reconstruction-NSD by MedARC-AI

fMRI-to-image reconstruction & retrieval on the NSD dataset

Created 2 years ago

Updated 1 year ago

DiffuEraser by lixiaowen-xw

Diffusion model for video inpainting, excelling in content completeness

Created 11 months ago

Updated 8 months ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic) and

Kris Rasmussen

Kris Rasmussen(CTO of Figma).

StableDiffusionReconstruction by yu-takagi

Research paper for reconstructing images from human brain activity using latent diffusion

Created 2 years ago

Updated 2 years ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

Show-o by showlab

Unified transformer research paper for multimodal tasks

Created 1 year ago

Updated 3 days ago

Starred by

Evan Hubinger

Evan Hubinger(Head of Alignment Stress-Testing at Anthropic).

lucent by greentfrapp

PyTorch adaptation of the Lucid feature visualization library

Created 5 years ago

Updated 9 months ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

FateZero by ChenyangQiQi

Zero-shot video editor (ICCV 2023 Oral) using attention fusion

Created 2 years ago

Updated 2 years ago

Starred by

Chenlin Meng

Chenlin Meng(Cofounder of Pika).

video-diffusion-pytorch by lucidrains

PyTorch implementation of video diffusion models

Created 3 years ago

Updated 1 year ago

Feedback? Help us improve.