drake  by ropensci

R package for reproducible data workflows and high-performance computing

created 8 years ago
1,340 stars

Top 30.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

The drake R package is a workflow toolkit designed to enhance reproducibility and performance in data analysis pipelines. It targets R users, particularly researchers and data scientists, by automating dependency management, skipping already computed steps, and enabling distributed computing, thereby saving significant runtime and increasing confidence in results.

How It Works

drake operates by analyzing a user-defined "plan" which specifies targets and their dependencies. It intelligently caches results, only recomputing targets whose inputs have changed. This approach, known as memoization, significantly speeds up iterative development and complex analyses by avoiding redundant computations. The package also offers robust support for parallel and distributed computing, abstracting away the complexities of task scheduling.

Quick Start & Requirements

  • Install from CRAN: install.packages("drake")
  • Install development version: library(devtools); install_github("ropensci/drake")
  • Requires R.

Highlighted Details

  • Reproducibility: Tracks build history, function arguments, and dependencies to provide evidence of internal consistency.
  • Efficiency: Caches results and skips up-to-date targets, with support for specialized data formats like "fst" for large datasets.
  • Parallelism: Integrates with clustermq for scaling computations across multiple cores or HPC systems.
  • Visualization: vis_drake_graph() provides an interactive network visualization of the workflow dependencies.

Maintenance & Community

drake is part of the rOpenSci ecosystem, indicating a commitment to robust, well-documented, and peer-reviewed software. The project is actively maintained, with a clear successor, targets, recommended for new projects.

Licensing & Compatibility

Licensed under the MIT license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The project explicitly states it is superseded by the targets package, which is recommended for new development due to being more robust and easier to use. Users are advised to transition to targets.

Health Check
Last commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Starred by Logan Kilpatrick Logan Kilpatrick(Product Lead on Google AI Studio), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

catalyst by catalyst-team

0%
3k
PyTorch framework for accelerated deep learning R&D
created 7 years ago
updated 1 month ago
Feedback? Help us improve.