Vision-language model for SVG generation as a code generation task
Top 12.6% on sourcepulse
StarVector is a multimodal foundation model for generating Scalable Vector Graphics (SVG) code from both images and text. It addresses the limitations of traditional vectorization methods by leveraging a vision-language model architecture, enabling semantic understanding and precise use of SVG primitives. This makes it suitable for researchers and developers working on automated graphic design, icon generation, and diagram creation.
How It Works
StarVector employs a vision-language model architecture, based on StarCoder, to treat SVG generation as a code generation task. Images are projected into visual tokens, and the model generates SVG code directly. For text-to-SVG, it processes textual prompts to create novel SVGs. This approach allows for semantic understanding of visual content and precise application of SVG primitives beyond simple paths, leading to more compact and semantically rich outputs.
Quick Start & Requirements
pip install -e .
. Requires Python 3.11.3.torch
, transformers
, Pillow
, accelerate
, deepspeed
(for 1B models), FSDP
(for 8B models). CUDA is required for GPU acceleration.starvector/starvector-8b-im2svg
) and generating SVGs from images.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The provided models are not trained for natural images or illustrations and will not perform well on such inputs. The README mentions that StarVector-8B requires FSDP for training, indicating a significant hardware requirement for training from scratch.
3 months ago
1 day