VIINA (Violent Incident Information from News Articles) is a comprehensive, near-real-time data system tracking events and territorial control during the 2022 Russian invasion of Ukraine. It provides granular, GIS-ready data derived from Ukrainian and Russian news reports, classified using machine learning. The project is valuable for students, journalists, policymakers, and researchers seeking detailed, up-to-date information on the conflict.
How It Works
VIINA processes news articles through automated web scraping, location extraction, and geocoding. A BERT-based transformer model classifies events into predefined categories (e.g., military operations, airstrikes, civilian casualties) and identifies involved actors. Territorial control is determined by a majority-vote system combining data from DeepStateMap, Institute for the Study of War, Wikipedia, and VIINA's own event reports, with ties broken by DeepStateMap.
Quick Start & Requirements
- Data is available as compressed CSV files (e.g.,
control_latest_2023.zip
, event_labels_latest_2023.zip
).
- No specific installation or runtime environment is required to access the data files.
- GIS-ready data includes temporal precision down to the minute and full source information.
- Links to data archives and detailed documentation are provided in the README.
Highlighted Details
- Offers two distinct territorial control datasets: one based on GeoNames locations and another on Ukraine's KATOTTH administrative divisions.
- Event classification utilizes a fine-tuned BERT-base Slavic Cyrillic model, outperforming previous LSTM models across various event categories, with detailed AUC ROC statistics provided.
- Includes a "one-per-day" de-duplication filter for event reports, treating multiple reports of the same event type in a location on the same day as a single entry.
- Provides tessellated GeoJSON geometries for Ukrainian populated places to facilitate spatial analysis.
Maintenance & Community
- The project is maintained by Yuri M. Zhukov, Associate Professor at Georgetown University.
- Feedback and corrections are welcomed via email.
- Previous versions of the data are available upon request.
Licensing & Compatibility
- Data is licensed under the Open Database License (ODbL) v1.0.
- Users must attribute any public use and share adapted versions under the same ODbL license.
- Compatible with commercial and closed-source use, provided attribution and share-alike clauses are met.
Limitations & Caveats
- The README advises caution for event categories with low out-of-sample AUC scores (below 0.80), particularly those with very few positive examples in the evaluation set.
- Data sources and classification models are subject to change as the conflict evolves.