ICLR2020-OpenReviewData  by shaohua0116

Data crawler for ICLR OpenReview webpages

created 5 years ago
461 stars

Top 66.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a Jupyter Notebook for crawling and visualizing metadata from ICLR 2020 OpenReview webpages. It offers insights into paper ratings, keywords, review lengths, and acceptance rates, aiding researchers in understanding trends and factors influencing paper acceptance.

How It Works

The project utilizes Selenium and ChromeDriver to automate web scraping of dynamic websites like OpenReview. It employs a headless browser setup for server environments and extracts data by targeting specific HTML class names. The crawled data is then processed and visualized using libraries such as NumPy, Matplotlib, and Seaborn.

Quick Start & Requirements

  • Installation: pip install pyvirtualdisplay selenium numpy h5py matplotlib seaborn pandas imageio wordcloud
  • Prerequisites: Python 3.6+, Selenium, pyvirtualdisplay (for headless operation), NumPy, h5py, Matplotlib, Seaborn, Pandas, Imageio, Wordcloud.
  • Setup: Requires installation of Google Chrome and ChromeDriver. Detailed instructions for Ubuntu are provided.

Highlighted Details

  • Provides a function PR to calculate a paper's percentile rank based on average reviewer ratings.
  • Generates word clouds from submission keywords to highlight trending research topics.
  • Analyzes reviewer rating distributions and average ratings for accepted vs. rejected papers.
  • Includes a table summarizing ICLR acceptance rates from 2017-2020.

Maintenance & Community

No specific information on maintenance or community channels is provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license.

Limitations & Caveats

The scraping mechanism relies on specific HTML class names, making it susceptible to breakage if the OpenReview website structure changes. The setup instructions are specific to Ubuntu.

Health Check
Last commit

5 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.