Ruby web scraping framework for JS-rendered sites
Top 37.6% on sourcepulse
Kimurai is a Ruby-based web scraping framework designed for extracting data from websites, including those with JavaScript-rendered content. It targets developers needing a robust and flexible tool for web scraping tasks, offering a familiar API based on Capybara and Nokogiri.
How It Works
Kimurai leverages various "engines" for fetching and rendering web pages: Mechanize for simple HTTP requests, Poltergeist (PhantomJS) for JavaScript rendering, and Selenium for Headless Chrome or Firefox. This engine abstraction allows users to switch rendering backends without rewriting their spider logic. The framework provides a Capybara-like interface for interacting with pages (e.g., clicking buttons, filling forms) and a structured approach to defining spiders, requests, and data parsing.
Quick Start & Requirements
gem install kimurai
kimurai setup
command automates environment setup on Ubuntu 18.04 using Ansible.Highlighted Details
Maintenance & Community
The project is actively maintained by vifreefly. Community support is available via chat.
Licensing & Compatibility
Licensed under the MIT License, permitting commercial use and integration with closed-source applications.
Limitations & Caveats
While Selenium engines offer robust JavaScript rendering, they can be more resource-intensive than Mechanize. The README notes that Selenium drivers do not support proxies with authorization. The kimurai setup
command currently only supports Ubuntu 18.04.
1 year ago
1 day