star-vector  by joanrod

Vision-language model for SVG generation as a code generation task

created 1 year ago
3,957 stars

Top 12.6% on sourcepulse

GitHubView on GitHub
Project Summary

StarVector is a multimodal foundation model for generating Scalable Vector Graphics (SVG) code from both images and text. It addresses the limitations of traditional vectorization methods by leveraging a vision-language model architecture, enabling semantic understanding and precise use of SVG primitives. This makes it suitable for researchers and developers working on automated graphic design, icon generation, and diagram creation.

How It Works

StarVector employs a vision-language model architecture, based on StarCoder, to treat SVG generation as a code generation task. Images are projected into visual tokens, and the model generates SVG code directly. For text-to-SVG, it processes textual prompts to create novel SVGs. This approach allows for semantic understanding of visual content and precise application of SVG primitives beyond simple paths, leading to more compact and semantically rich outputs.

Quick Start & Requirements

  • Installation: Clone the repository and install via pip install -e .. Requires Python 3.11.3.
  • Dependencies: torch, transformers, Pillow, accelerate, deepspeed (for 1B models), FSDP (for 8B models). CUDA is required for GPU acceleration.
  • Usage: Example Python code provided for loading models from HuggingFace (starvector/starvector-8b-im2svg) and generating SVGs from images.
  • Demo: Gradio web UI available for interactive use, with options for HuggingFace or VLLM backends.
  • Documentation: Links to HuggingFace models, datasets (SVG-Bench), and the project website are provided.

Highlighted Details

  • Achieves state-of-the-art performance on the SVG-Bench benchmark across tasks like Image-to-SVG, Text-to-SVG, and diagram generation.
  • Trained on the SVG-Stack dataset, comprising 2M samples, enabling generalization across various SVG primitives.
  • Models are specifically trained for icons, logotypes, technical diagrams, graphs, and charts, not natural images.
  • Offers both 1B and 8B parameter model checkpoints on HuggingFace.

Maintenance & Community

  • Accepted at CVPR 2025.
  • Models and datasets are available on HuggingFace.
  • Project appears actively developed with clear citation information.

Licensing & Compatibility

  • Licensed under the Apache License, Version 2.0.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The provided models are not trained for natural images or illustrations and will not perform well on such inputs. The README mentions that StarVector-8B requires FSDP for training, indicating a significant hardware requirement for training from scratch.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
233 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Wei-Lin Chiang Wei-Lin Chiang(Cofounder of LMArena), and
7 more.

dalle-mini by borisdayma

0.1%
15k
Text-to-image model for generating images from text prompts
created 4 years ago
updated 1 year ago
Feedback? Help us improve.