tidytuesday  by rfordatascience

Weekly data project for learning data analysis

created 7 years ago
7,568 stars

Top 7.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

TidyTuesday is a weekly social data project designed to help individuals learn data tidying and visualization skills using real-world datasets. It targets data scientists, analysts, and students across R, Python, and Julia, fostering a collaborative learning environment. The project provides a consistent stream of diverse datasets, encouraging practical application of data science techniques.

How It Works

The project releases a new dataset each Monday, accompanied by instructions for accessing it in R, Python, or Julia, or for direct download. Participants are encouraged to explore the data, identify interesting relationships, and create visualizations, models, or reports. The emphasis is on practicing data manipulation and visualization techniques rather than drawing causal conclusions.

Quick Start & Requirements

  • Access Data: Download directly from GitHub or use provided R, Python, or Julia packages.
  • Tools: R, Python, Julia, or any preferred data exploration tool.
  • Sharing: Use #TidyTuesday (R), #PydyTuesday (Python), or #TidierTuesday (Julia) hashtags on social media.
  • Resources: Posit's PydyTuesday repo for Python users and Connect Cloud for easy publishing.

Highlighted Details

  • Weekly social data project with a focus on learning data tidying and visualization.
  • Supports R, Python, and Julia ecosystems.
  • Encourages sharing of code and results on social media.
  • Datasets span various domains, sourced from reputable origins.

Maintenance & Community

  • Organized by the Data Science Learning Community.
  • Active Slack channel for R, Python, and data-related help.
  • Encourages community contribution to dataset curation.
  • Links to past datasets are available for 2018-2024.

Licensing & Compatibility

  • The project itself is not explicitly licensed, but the datasets are sourced from various places, implying their own licenses. Users should verify dataset-specific licenses.
  • Compatible with R, Python, Julia, and other data analysis tools.

Limitations & Caveats

The project emphasizes practicing data tidying and plotting, explicitly cautioning against drawing causal conclusions from the provided datasets due to potential uncaptured moderating variables.

Health Check
Last commit

3 days ago

Responsiveness

1+ week

Pull Requests (30d)
7
Issues (30d)
8
Star History
237 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.