DeTikZify  by potamides

Graphics program synthesizer for scientific figures/sketches with TikZ

created 1 year ago
1,492 stars

Top 28.2% on sourcepulse

GitHubView on GitHub
Project Summary

DeTikZify synthesizes scientific figures as semantic-preserving TikZ graphics programs from sketches and existing images. It targets researchers and users who need to efficiently create or recreate complex scientific illustrations, offering a significant time-saving advantage over manual creation.

How It Works

DeTikZify employs a multimodal language model to translate visual input into TikZ code. It utilizes an MCTS-based inference algorithm that allows for iterative refinement of the generated graphics programs without requiring additional training data. This approach enables the model to improve output quality and explore diverse graphical representations.

Quick Start & Requirements

  • Installation: pip install 'detikzify[legacy] @ git+https://github.com/potamides/DeTikZify' (remove [legacy] for v2 only). For examples: git clone https://github.com/potamides/DeTikZify and pip install -e DeTikZify[examples].
  • Prerequisites: Full TeX Live 2023 installation, ghostscript, poppler. Requires bfloat16 support for v2 (8b) and v3 (10b) models.
  • Resources: Hugging Face Spaces are available for inference, with options for paid private GPU runtimes. Google Colab demo is available but limited to 1b models on the free tier.
  • Docs/Demo: Hugging Face Space, Google Colab.

Highlighted Details

  • Supports zero-shot text-conditioning via Ti k Zero adapters.
  • MCTS-based inference for iterative output refinement.
  • Models based on LLaVA, AutomaTikZ (v1), and Idefics 3 (v2) architectures.
  • Ti k Zero architecture inspired by Flamingo and LLaMA 3.2-Vision.

Maintenance & Community

  • Project accepted at NeurIPS 2024 as a spotlight paper.
  • Model weights and datasets are available on Hugging Face Hub.
  • Dataset creation scripts are released, encouraging community recreation of full datasets.

Licensing & Compatibility

  • The specific license is not explicitly stated in the README, but the mention of arXiv's non-exclusive license for dataset redistribution suggests potential complexities. Further clarification on the project's license is recommended for commercial use.

Limitations & Caveats

  • arXiv data was removed from public datasets due to licensing restrictions, requiring users to recreate the full dataset.
  • Text-conditioning is currently only supported through the programming interface, not the web UI.
Health Check
Last commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
4
Star History
332 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.