star-vector by joanrod

Vision-language model for SVG generation as a code generation task

Created 2 years ago

4,193 stars

Top 11.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Amin Ahmad

Cofounder of Vectara

Project Summary

StarVector is a multimodal foundation model for generating Scalable Vector Graphics (SVG) code from both images and text. It addresses the limitations of traditional vectorization methods by leveraging a vision-language model architecture, enabling semantic understanding and precise use of SVG primitives. This makes it suitable for researchers and developers working on automated graphic design, icon generation, and diagram creation.

How It Works

StarVector employs a vision-language model architecture, based on StarCoder, to treat SVG generation as a code generation task. Images are projected into visual tokens, and the model generates SVG code directly. For text-to-SVG, it processes textual prompts to create novel SVGs. This approach allows for semantic understanding of visual content and precise application of SVG primitives beyond simple paths, leading to more compact and semantically rich outputs.

Quick Start & Requirements

Installation: Clone the repository and install via pip install -e .. Requires Python 3.11.3.
Dependencies: torch, transformers, Pillow, accelerate, deepspeed (for 1B models), FSDP (for 8B models). CUDA is required for GPU acceleration.
Usage: Example Python code provided for loading models from HuggingFace (starvector/starvector-8b-im2svg) and generating SVGs from images.
Demo: Gradio web UI available for interactive use, with options for HuggingFace or VLLM backends.
Documentation: Links to HuggingFace models, datasets (SVG-Bench), and the project website are provided.

Highlighted Details

Achieves state-of-the-art performance on the SVG-Bench benchmark across tasks like Image-to-SVG, Text-to-SVG, and diagram generation.
Trained on the SVG-Stack dataset, comprising 2M samples, enabling generalization across various SVG primitives.
Models are specifically trained for icons, logotypes, technical diagrams, graphs, and charts, not natural images.
Offers both 1B and 8B parameter model checkpoints on HuggingFace.

Maintenance & Community

Accepted at CVPR 2025.
Models and datasets are available on HuggingFace.
Project appears actively developed with clear citation information.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The provided models are not trained for natural images or illustrations and will not perform well on such inputs. The README mentions that StarVector-8B requires FSDP for training, indicating a significant hardware requirement for training from scratch.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

49 stars in the last 30 days