DeTikZify by potamides

Graphics program synthesizer for scientific figures/sketches with TikZ

Created 2 years ago

1,730 stars

Top 24.2% on SourcePulse

View on GitHub

3 Experts Love This Project

Jonathan Ragan-Kelley

Professor at MIT

Lysandre Debut

Chief Open-Source Officer at Hugging Face

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

DeTikZify synthesizes scientific figures as semantic-preserving TikZ graphics programs from sketches and existing images. It targets researchers and users who need to efficiently create or recreate complex scientific illustrations, offering a significant time-saving advantage over manual creation.

How It Works

DeTikZify employs a multimodal language model to translate visual input into TikZ code. It utilizes an MCTS-based inference algorithm that allows for iterative refinement of the generated graphics programs without requiring additional training data. This approach enables the model to improve output quality and explore diverse graphical representations.

Quick Start & Requirements

Installation: pip install 'detikzify[legacy] @ git+https://github.com/potamides/DeTikZify' (remove [legacy] for v2 only). For examples: git clone https://github.com/potamides/DeTikZify and pip install -e DeTikZify[examples].
Prerequisites: Full TeX Live 2023 installation, ghostscript, poppler. Requires bfloat16 support for v2 (8b) and v3 (10b) models.
Resources: Hugging Face Spaces are available for inference, with options for paid private GPU runtimes. Google Colab demo is available but limited to 1b models on the free tier.
Docs/Demo: Hugging Face Space, Google Colab.

Highlighted Details

Supports zero-shot text-conditioning via Ti k Zero adapters.
MCTS-based inference for iterative output refinement.
Models based on LLaVA, AutomaTikZ (v1), and Idefics 3 (v2) architectures.
Ti k Zero architecture inspired by Flamingo and LLaMA 3.2-Vision.

Maintenance & Community

Project accepted at NeurIPS 2024 as a spotlight paper.
Model weights and datasets are available on Hugging Face Hub.
Dataset creation scripts are released, encouraging community recreation of full datasets.

Licensing & Compatibility

The specific license is not explicitly stated in the README, but the mention of arXiv's non-exclusive license for dataset redistribution suggests potential complexities. Further clarification on the project's license is recommended for commercial use.

Limitations & Caveats

arXiv data was removed from public datasets due to licensing restrictions, requiring users to recreate the full dataset.
Text-conditioning is currently only supported through the programming interface, not the web UI.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

26 stars in the last 30 days