LLM4SVG  by ximinng

Research paper on LLMs for SVG understanding/generation

Created 9 months ago
552 stars

Top 57.9% on SourcePulse

GitHubView on GitHub
Project Summary

LLM4SVG enables Large Language Models to process, understand, and generate complex Scalable Vector Graphics (SVG). It is designed for researchers and developers working with multimodal AI and vector graphics, offering a robust framework for SVG-centric LLM applications.

How It Works

LLM4SVG leverages specialized datasets (SVGX-Core-250k for pretraining, SVGX-SFT-1M for fine-tuning) and integrates with popular LLM frameworks like LLaMA-Factory, Unsloth, Transformers, and TRL. It supports multimodal inputs (text and vision) and offers flexible training options including LoRA and full fine-tuning, with optimized inference via vLLM. The approach includes specialized SVG tokenization for improved SVG syntax handling.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment (conda env create -f environment.yml && conda activate llm4svg), download datasets (bash script/download_dataset.sh), set up datasets (bash script/setup_dataset.sh), and install LLaMA-Factory (cd LLaMA-Factory && pip install -e ".[torch,metrics]").
  • Prerequisites: Python 3.10+, PyTorch, Hugging Face libraries, LLaMA-Factory, Unsloth, TRL, vLLM. GPU acceleration is highly recommended for training and inference.
  • Resources: Requires significant GPU memory for training larger models and datasets.
  • Links: Project Website, CVPR 2025 Paper, SVGX-Core-250k Dataset, SVGX-SFT-1M Dataset.

Highlighted Details

  • Supports fine-tuning a wide range of foundation models including Llama 3.2, Qwen2.5-VL, Gemma 3, DeepSeek, Falcon, Phi-2, and GPT2-XL.
  • Offers two specialized datasets: SVGX-Core-250k (pretraining) and SVGX-SFT-1M (supervised fine-tuning).
  • Integrates with vLLM for high-throughput, low-latency inference, claiming up to 2x faster generation.
  • Provides example configurations for training using LLaMA-Factory, Unsloth, Transformers, and TRL.

Maintenance & Community

The project is associated with the CVPR 2025 conference. For questions, bug reports, or collaboration, users are encouraged to open issues or submit pull requests on GitHub.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Pretrained model weights are pending release. The project relies on external frameworks like LLaMA-Factory and Unsloth, which may have their own dependencies and maintenance cycles. Optimal performance may require significant GPU resources and careful configuration of training parameters.

Health Check
Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.