quickdraw-dataset by googlecreativelab

Dataset of 50M drawings across 345 categories from the "Quick, Draw!" game

Created 8 years ago

6,658 stars

Top 7.6% on SourcePulse

View on GitHub

9 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Vincent Weisser

Cofounder of Prime Intellect

Kevin Hou

Head of Product Engineering at Windsurf

Tim J. Baek

Founder of Open WebUI

and 5 more!

Project Summary

The Quick, Draw! Dataset provides access to 50 million vector drawings across 345 categories, collected from players of the "Quick, Draw!" game. This dataset is valuable for researchers, developers, and artists interested in machine learning, human-computer interaction, and creative applications, offering insights into drawing patterns and user input.

How It Works

The dataset comprises timestamped vector drawings, captured as sequences of (x, y) coordinates and time deltas for each stroke. It's available in raw ndjson format, simplified ndjson (scaled to 256x256 and resampled), custom binary format, and as 28x28 grayscale NumPy bitmaps. This multi-format availability caters to various processing needs, from direct analysis to efficient machine learning model input.

Quick Start & Requirements

Access: Data is available on Google Cloud Storage. A common method to download simplified drawings is gsutil -m cp 'gs://quickdraw_dataset/full/simplified/*.ndjson' .
Prerequisites: Python and gsutil for downloading. NumPy for .npy files. TensorFlow and Magenta Project for Sketch-RNN.
Resources: The full dataset is substantial; download times and storage requirements will be significant.
Links: quickdraw.withgoogle.com/data, Magenta Project

Highlighted Details

50 million drawings across 345 categories.
Includes metadata: word, country code, timestamp, recognition status.
Simplified formats include Ramer-Douglas-Peucker simplification and scaling.
Available as NumPy bitmaps (28x28 grayscale) for ML model training.

Maintenance & Community

This is a static dataset release by Google Creative Lab. While there are numerous projects and research papers utilizing the data, there's no active development or community forum directly associated with this repository.

Licensing & Compatibility

License: Creative Commons Attribution 4.0 International (CC BY 4.0).
Compatibility: Permissive for commercial use and integration into closed-source projects, requiring attribution.

Limitations & Caveats

The dataset may contain inappropriate content despite moderation. The raw data's variable bounding boxes and point counts require preprocessing for consistent analysis.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

32 stars in the last 30 days