star-vector  by joanrod

Vision-language model for SVG generation as a code generation task

Created 1 year ago
4,030 stars

Top 12.2% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

StarVector is a multimodal foundation model for generating Scalable Vector Graphics (SVG) code from both images and text. It addresses the limitations of traditional vectorization methods by leveraging a vision-language model architecture, enabling semantic understanding and precise use of SVG primitives. This makes it suitable for researchers and developers working on automated graphic design, icon generation, and diagram creation.

How It Works

StarVector employs a vision-language model architecture, based on StarCoder, to treat SVG generation as a code generation task. Images are projected into visual tokens, and the model generates SVG code directly. For text-to-SVG, it processes textual prompts to create novel SVGs. This approach allows for semantic understanding of visual content and precise application of SVG primitives beyond simple paths, leading to more compact and semantically rich outputs.

Quick Start & Requirements

  • Installation: Clone the repository and install via pip install -e .. Requires Python 3.11.3.
  • Dependencies: torch, transformers, Pillow, accelerate, deepspeed (for 1B models), FSDP (for 8B models). CUDA is required for GPU acceleration.
  • Usage: Example Python code provided for loading models from HuggingFace (starvector/starvector-8b-im2svg) and generating SVGs from images.
  • Demo: Gradio web UI available for interactive use, with options for HuggingFace or VLLM backends.
  • Documentation: Links to HuggingFace models, datasets (SVG-Bench), and the project website are provided.

Highlighted Details

  • Achieves state-of-the-art performance on the SVG-Bench benchmark across tasks like Image-to-SVG, Text-to-SVG, and diagram generation.
  • Trained on the SVG-Stack dataset, comprising 2M samples, enabling generalization across various SVG primitives.
  • Models are specifically trained for icons, logotypes, technical diagrams, graphs, and charts, not natural images.
  • Offers both 1B and 8B parameter model checkpoints on HuggingFace.

Maintenance & Community

  • Accepted at CVPR 2025.
  • Models and datasets are available on HuggingFace.
  • Project appears actively developed with clear citation information.

Licensing & Compatibility

  • Licensed under the Apache License, Version 2.0.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The provided models are not trained for natural images or illustrations and will not perform well on such inputs. The README mentions that StarVector-8B requires FSDP for training, indicating a significant hardware requirement for training from scratch.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
50 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Sourabh Bajaj Sourabh Bajaj(Cofounder of Uplimit).

OmniSVG by OmniSVG

0.4%
2k
Multimodal SVG generator research paper leveraging VLMs
Created 5 months ago
Updated 1 month ago
Feedback? Help us improve.