StarData  by TorchCraft

StarCraft replay dataset for AI research

Created 8 years ago
573 stars

Top 56.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides StarData, a large-scale dataset of 65,646 StarCraft: Brood War replays, aimed at AI researchers and practitioners. It enables high-quality analysis and development of AI agents for real-time strategy games, offering 365 GB of compressed data with 1.5 billion frames and 496 million player actions, all captured at 8 frames per second.

How It Works

The dataset is designed for use with TorchCraft, a framework that allows interaction with StarCraft: Brood War replays. TorchCraft provides replayer modules in C++, Python, and Lua, enabling efficient parsing and access to frame-by-frame game data. This approach facilitates detailed analysis of game states, player actions, and unit behaviors, crucial for training and evaluating AI models.

Quick Start & Requirements

  • Install TorchCraft: git submodule update --init && cd TorchCraft && pip install .
  • Prerequisites: libzstd-1.1.4+ is required for replay parsing.
  • Data Access: Replays are available via AWS S3 at s3://stardata or through provided chunked download links. Standardized train, validation, and test sets are also available.
  • Documentation: https://github.com/TorchCraft/TorchCraft

Highlighted Details

  • Largest StarCraft: Brood War replay dataset available (65,646 games).
  • Data captured at 8 frames per second for granular analysis.
  • Includes tools for preprocessing, clustering, and reproducing results.
  • Compatible with TorchCraft versions 1.3.0 and later.

Maintenance & Community

The project is associated with research from authors like Z. Lin, G. Synnaeve, and others, with a whitepaper available on arXiv. Specific community channels are not explicitly mentioned in the README.

Licensing & Compatibility

StarData is BSD-licensed with an additional patent grant, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The provided replay data is specifically compatible with TorchCraft version 1.3.0. Some reproduction scripts are still pending cleanup and easier installation.

Health Check
Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
26 more.

datasets by huggingface

0.1%
21k
Access and process large AI datasets efficiently
Created 5 years ago
Updated 1 day ago
Feedback? Help us improve.