quickdraw-dataset  by googlecreativelab

Dataset of 50M drawings across 345 categories from the "Quick, Draw!" game

Created 8 years ago
6,529 stars

Top 7.9% on SourcePulse

GitHubView on GitHub
Project Summary

The Quick, Draw! Dataset provides access to 50 million vector drawings across 345 categories, collected from players of the "Quick, Draw!" game. This dataset is valuable for researchers, developers, and artists interested in machine learning, human-computer interaction, and creative applications, offering insights into drawing patterns and user input.

How It Works

The dataset comprises timestamped vector drawings, captured as sequences of (x, y) coordinates and time deltas for each stroke. It's available in raw ndjson format, simplified ndjson (scaled to 256x256 and resampled), custom binary format, and as 28x28 grayscale NumPy bitmaps. This multi-format availability caters to various processing needs, from direct analysis to efficient machine learning model input.

Quick Start & Requirements

  • Access: Data is available on Google Cloud Storage. A common method to download simplified drawings is gsutil -m cp 'gs://quickdraw_dataset/full/simplified/*.ndjson' .
  • Prerequisites: Python and gsutil for downloading. NumPy for .npy files. TensorFlow and Magenta Project for Sketch-RNN.
  • Resources: The full dataset is substantial; download times and storage requirements will be significant.
  • Links: quickdraw.withgoogle.com/data, Magenta Project

Highlighted Details

  • 50 million drawings across 345 categories.
  • Includes metadata: word, country code, timestamp, recognition status.
  • Simplified formats include Ramer-Douglas-Peucker simplification and scaling.
  • Available as NumPy bitmaps (28x28 grayscale) for ML model training.

Maintenance & Community

This is a static dataset release by Google Creative Lab. While there are numerous projects and research papers utilizing the data, there's no active development or community forum directly associated with this repository.

Licensing & Compatibility

  • License: Creative Commons Attribution 4.0 International (CC BY 4.0).
  • Compatibility: Permissive for commercial use and integration into closed-source projects, requiring attribution.

Limitations & Caveats

The dataset may contain inappropriate content despite moderation. The raw data's variable bounding boxes and point counts require preprocessing for consistent analysis.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
35 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Sourabh Bajaj Sourabh Bajaj(Cofounder of Uplimit).

OmniSVG by OmniSVG

0.4%
2k
Multimodal SVG generator research paper leveraging VLMs
Created 5 months ago
Updated 1 month ago
Starred by Yang Song Yang Song(Professor at Caltech; Research Scientist at OpenAI), Robert Stojnic Robert Stojnic(Cocreator of Papers with Code), and
1 more.

ml-visuals by dair-ai

0.3%
16k
ML visuals for science communication
Created 5 years ago
Updated 2 years ago
Feedback? Help us improve.