quickdraw-dataset  by googlecreativelab

Dataset of 50M drawings across 345 categories from the "Quick, Draw!" game

created 8 years ago
6,480 stars

Top 8.0% on sourcepulse

GitHubView on GitHub
Project Summary

The Quick, Draw! Dataset provides access to 50 million vector drawings across 345 categories, collected from players of the "Quick, Draw!" game. This dataset is valuable for researchers, developers, and artists interested in machine learning, human-computer interaction, and creative applications, offering insights into drawing patterns and user input.

How It Works

The dataset comprises timestamped vector drawings, captured as sequences of (x, y) coordinates and time deltas for each stroke. It's available in raw ndjson format, simplified ndjson (scaled to 256x256 and resampled), custom binary format, and as 28x28 grayscale NumPy bitmaps. This multi-format availability caters to various processing needs, from direct analysis to efficient machine learning model input.

Quick Start & Requirements

  • Access: Data is available on Google Cloud Storage. A common method to download simplified drawings is gsutil -m cp 'gs://quickdraw_dataset/full/simplified/*.ndjson' .
  • Prerequisites: Python and gsutil for downloading. NumPy for .npy files. TensorFlow and Magenta Project for Sketch-RNN.
  • Resources: The full dataset is substantial; download times and storage requirements will be significant.
  • Links: quickdraw.withgoogle.com/data, Magenta Project

Highlighted Details

  • 50 million drawings across 345 categories.
  • Includes metadata: word, country code, timestamp, recognition status.
  • Simplified formats include Ramer-Douglas-Peucker simplification and scaling.
  • Available as NumPy bitmaps (28x28 grayscale) for ML model training.

Maintenance & Community

This is a static dataset release by Google Creative Lab. While there are numerous projects and research papers utilizing the data, there's no active development or community forum directly associated with this repository.

Licensing & Compatibility

  • License: Creative Commons Attribution 4.0 International (CC BY 4.0).
  • Compatibility: Permissive for commercial use and integration into closed-source projects, requiring attribution.

Limitations & Caveats

The dataset may contain inappropriate content despite moderation. The raw data's variable bounding boxes and point counts require preprocessing for consistent analysis.

Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
116 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.