drake by ropensci

R package for reproducible data workflows and high-performance computing

Created 9 years ago

1,341 stars

Top 29.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

The drake R package is a workflow toolkit designed to enhance reproducibility and performance in data analysis pipelines. It targets R users, particularly researchers and data scientists, by automating dependency management, skipping already computed steps, and enabling distributed computing, thereby saving significant runtime and increasing confidence in results.

How It Works

drake operates by analyzing a user-defined "plan" which specifies targets and their dependencies. It intelligently caches results, only recomputing targets whose inputs have changed. This approach, known as memoization, significantly speeds up iterative development and complex analyses by avoiding redundant computations. The package also offers robust support for parallel and distributed computing, abstracting away the complexities of task scheduling.

Quick Start & Requirements

Install from CRAN: install.packages("drake")
Install development version: library(devtools); install_github("ropensci/drake")
Requires R.

Highlighted Details

Reproducibility: Tracks build history, function arguments, and dependencies to provide evidence of internal consistency.
Efficiency: Caches results and skips up-to-date targets, with support for specialized data formats like "fst" for large datasets.
Parallelism: Integrates with clustermq for scaling computations across multiple cores or HPC systems.
Visualization: vis_drake_graph() provides an interactive network visualization of the workflow dependencies.

Maintenance & Community

drake is part of the rOpenSci ecosystem, indicating a commitment to robust, well-documented, and peer-reviewed software. The project is actively maintained, with a clear successor, targets, recommended for new projects.

Licensing & Compatibility

Licensed under the MIT license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The project explicitly states it is superseded by the targets package, which is recommended for new development due to being more robust and easier to use. Users are advised to transition to targets.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days