Data crawler for ICLR OpenReview webpages
Top 66.7% on sourcepulse
This repository provides a Jupyter Notebook for crawling and visualizing metadata from ICLR 2020 OpenReview webpages. It offers insights into paper ratings, keywords, review lengths, and acceptance rates, aiding researchers in understanding trends and factors influencing paper acceptance.
How It Works
The project utilizes Selenium and ChromeDriver to automate web scraping of dynamic websites like OpenReview. It employs a headless browser setup for server environments and extracts data by targeting specific HTML class names. The crawled data is then processed and visualized using libraries such as NumPy, Matplotlib, and Seaborn.
Quick Start & Requirements
pip install pyvirtualdisplay selenium numpy h5py matplotlib seaborn pandas imageio wordcloud
Highlighted Details
PR
to calculate a paper's percentile rank based on average reviewer ratings.Maintenance & Community
No specific information on maintenance or community channels is provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license.
Limitations & Caveats
The scraping mechanism relies on specific HTML class names, making it susceptible to breakage if the OpenReview website structure changes. The setup instructions are specific to Ubuntu.
5 years ago
1 day