LLM4SVG  by ximinng

Research paper on LLMs for SVG understanding/generation

created 7 months ago
538 stars

Top 59.8% on sourcepulse

GitHubView on GitHub
Project Summary

LLM4SVG enables Large Language Models to process, understand, and generate complex Scalable Vector Graphics (SVG). It is designed for researchers and developers working with multimodal AI and vector graphics, offering a robust framework for SVG-centric LLM applications.

How It Works

LLM4SVG leverages specialized datasets (SVGX-Core-250k for pretraining, SVGX-SFT-1M for fine-tuning) and integrates with popular LLM frameworks like LLaMA-Factory, Unsloth, Transformers, and TRL. It supports multimodal inputs (text and vision) and offers flexible training options including LoRA and full fine-tuning, with optimized inference via vLLM. The approach includes specialized SVG tokenization for improved SVG syntax handling.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment (conda env create -f environment.yml && conda activate llm4svg), download datasets (bash script/download_dataset.sh), set up datasets (bash script/setup_dataset.sh), and install LLaMA-Factory (cd LLaMA-Factory && pip install -e ".[torch,metrics]").
  • Prerequisites: Python 3.10+, PyTorch, Hugging Face libraries, LLaMA-Factory, Unsloth, TRL, vLLM. GPU acceleration is highly recommended for training and inference.
  • Resources: Requires significant GPU memory for training larger models and datasets.
  • Links: Project Website, CVPR 2025 Paper, SVGX-Core-250k Dataset, SVGX-SFT-1M Dataset.

Highlighted Details

  • Supports fine-tuning a wide range of foundation models including Llama 3.2, Qwen2.5-VL, Gemma 3, DeepSeek, Falcon, Phi-2, and GPT2-XL.
  • Offers two specialized datasets: SVGX-Core-250k (pretraining) and SVGX-SFT-1M (supervised fine-tuning).
  • Integrates with vLLM for high-throughput, low-latency inference, claiming up to 2x faster generation.
  • Provides example configurations for training using LLaMA-Factory, Unsloth, Transformers, and TRL.

Maintenance & Community

The project is associated with the CVPR 2025 conference. For questions, bug reports, or collaboration, users are encouraged to open issues or submit pull requests on GitHub.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Pretrained model weights are pending release. The project relies on external frameworks like LLaMA-Factory and Unsloth, which may have their own dependencies and maintenance cycles. Optimal performance may require significant GPU resources and careful configuration of training parameters.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
214 stars in the last 90 days

Explore Similar Projects

Starred by Travis Fischer Travis Fischer(Founder of Agentic), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
9 more.

LLaVA by haotian-liu

0.2%
23k
Multimodal assistant with GPT-4 level capabilities
created 2 years ago
updated 11 months ago
Feedback? Help us improve.