STEAD  by smousavi05

Seismic signal dataset for AI-driven earthquake research

created 6 years ago
335 stars

Top 83.1% on sourcepulse

GitHubView on GitHub
Project Summary

The STanford EArthquake Dataset (STEAD) provides a comprehensive global collection of seismic signals for AI-driven earthquake analysis. It targets researchers and engineers in seismology and machine learning, offering a large, curated dataset to train and validate models for earthquake detection, characterization, and related tasks.

How It Works

STEAD organizes seismic waveforms into HDF5 files, with associated metadata in CSV files. This structure allows for efficient data access and filtering. The dataset includes both earthquake and noise waveforms, with detailed metadata such as event location, magnitude, and station information. The project also provides Python scripts for accessing, filtering, and processing the data, including converting raw waveforms to displacement, velocity, or acceleration using Obspy and instrument response information.

Quick Start & Requirements

  • Download: Data is available in chunks (~14-16 GB each) or as a single ~85 GB file.
  • Dependencies: Python, Pandas, H5py, NumPy, Matplotlib, Obspy, Obspy FDSN client.
  • Tools: QuakeLabeler and SeisBench can be used for labeling and conversion to STEAD format.
  • Resources: Requires significant disk space for the dataset and computational resources for processing.
  • Documentation: STEAD Paper, Code Repository

Highlighted Details

  • Global dataset of seismic signals for AI.
  • Includes local earthquakes and noise waveforms.
  • Provides Python scripts for data access, filtering, and waveform conversion (displacement, velocity, acceleration).
  • Metadata includes event location, magnitude, and station details.

Maintenance & Community

The dataset was last updated on May 25, 2020. Bug reporting is handled via GitHub issues or email. The primary author is S. M. Mousavi. Several studies have utilized STEAD, with their code repositories available as examples.

Licensing & Compatibility

The repository license is not explicitly stated in the README, but a "LICENSE" file is mentioned. Compatibility for commercial use or closed-source linking would require clarification of the specific license terms.

Limitations & Caveats

The README notes that some back azimuths in the current version may be misplaced and can be recalculated using Obspy. Less than 4% of noise data may have identical waveforms across components due to single-channel stations. The dataset is large, requiring substantial storage and bandwidth.

Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers).

voice_datasets by jim-schwoebel

0.1%
2k
Voice dataset list for voice/sound computing
created 6 years ago
updated 1 year ago
Feedback? Help us improve.