LLM4SVG by ximinng

Research paper on LLMs for SVG understanding/generation

Created 1 year ago

600 stars

Top 54.4% on SourcePulse

Project Summary

LLM4SVG enables Large Language Models to process, understand, and generate complex Scalable Vector Graphics (SVG). It is designed for researchers and developers working with multimodal AI and vector graphics, offering a robust framework for SVG-centric LLM applications.

How It Works

LLM4SVG leverages specialized datasets (SVGX-Core-250k for pretraining, SVGX-SFT-1M for fine-tuning) and integrates with popular LLM frameworks like LLaMA-Factory, Unsloth, Transformers, and TRL. It supports multimodal inputs (text and vision) and offers flexible training options including LoRA and full fine-tuning, with optimized inference via vLLM. The approach includes specialized SVG tokenization for improved SVG syntax handling.

Quick Start & Requirements

Installation: Clone the repository, create a conda environment (conda env create -f environment.yml && conda activate llm4svg), download datasets (bash script/download_dataset.sh), set up datasets (bash script/setup_dataset.sh), and install LLaMA-Factory (cd LLaMA-Factory && pip install -e ".[torch,metrics]").
Prerequisites: Python 3.10+, PyTorch, Hugging Face libraries, LLaMA-Factory, Unsloth, TRL, vLLM. GPU acceleration is highly recommended for training and inference.
Resources: Requires significant GPU memory for training larger models and datasets.
Links: Project Website, CVPR 2025 Paper, SVGX-Core-250k Dataset, SVGX-SFT-1M Dataset.

Highlighted Details

Supports fine-tuning a wide range of foundation models including Llama 3.2, Qwen2.5-VL, Gemma 3, DeepSeek, Falcon, Phi-2, and GPT2-XL.
Offers two specialized datasets: SVGX-Core-250k (pretraining) and SVGX-SFT-1M (supervised fine-tuning).
Integrates with vLLM for high-throughput, low-latency inference, claiming up to 2x faster generation.
Provides example configurations for training using LLaMA-Factory, Unsloth, Transformers, and TRL.

Maintenance & Community

The project is associated with the CVPR 2025 conference. For questions, bug reports, or collaboration, users are encouraged to open issues or submit pull requests on GitHub.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Pretrained model weights are pending release. The project relies on external frameworks like LLaMA-Factory and Unsloth, which may have their own dependencies and maintenance cycles. Optimal performance may require significant GPU resources and careful configuration of training parameters.

LLM4SVG by ximinng

Explore Similar Projects

Awesome-Multimodal-LLM by HenryHZY

VARGPT by VARGPT-family

Open-LLaVA-NeXT by xiaoachen98

SEED by AILab-CVC

david-share by david-xinyuwei

TinyLLaVA_Factory by TinyLLaVA

LlamaGen by FoundationVision

LLaMA2-Accessory by Alpha-VLLM

Qwen-VL-Series-Finetune by 2U1

star-vector by joanrod

AliceMind by alibaba

ms-swift by modelscope