ICLR2020-OpenReviewData by shaohua0116

Data crawler for ICLR OpenReview webpages

Created 6 years ago

462 stars

Top 65.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Chenlin Meng

Cofounder of Pika

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

This repository provides a Jupyter Notebook for crawling and visualizing metadata from ICLR 2020 OpenReview webpages. It offers insights into paper ratings, keywords, review lengths, and acceptance rates, aiding researchers in understanding trends and factors influencing paper acceptance.

How It Works

The project utilizes Selenium and ChromeDriver to automate web scraping of dynamic websites like OpenReview. It employs a headless browser setup for server environments and extracts data by targeting specific HTML class names. The crawled data is then processed and visualized using libraries such as NumPy, Matplotlib, and Seaborn.

Quick Start & Requirements

Installation: pip install pyvirtualdisplay selenium numpy h5py matplotlib seaborn pandas imageio wordcloud
Prerequisites: Python 3.6+, Selenium, pyvirtualdisplay (for headless operation), NumPy, h5py, Matplotlib, Seaborn, Pandas, Imageio, Wordcloud.
Setup: Requires installation of Google Chrome and ChromeDriver. Detailed instructions for Ubuntu are provided.

Highlighted Details

Provides a function PR to calculate a paper's percentile rank based on average reviewer ratings.
Generates word clouds from submission keywords to highlight trending research topics.
Analyzes reviewer rating distributions and average ratings for accepted vs. rejected papers.
Includes a table summarizing ICLR acceptance rates from 2017-2020.

Maintenance & Community

No specific information on maintenance or community channels is provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license.

Limitations & Caveats

The scraping mechanism relies on specific HTML class names, making it susceptible to breakage if the OpenReview website structure changes. The setup instructions are specific to Ubuntu.

Health Check

Last Commit

6 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days