botflow  by kkyon

Dataflow framework for data pipelines (web crawling, ML, quant trading)

Created 7 years ago
1,200 stars

Top 32.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Botflow is a Python framework for building dataflow pipelines, targeting applications like web crawling, machine learning, and quantitative trading. It simplifies the creation of complex data processing workflows by connecting functions via pipes (queues) and enabling parallel execution through coroutines and ThreadPools, making it suitable for developers familiar with Unix-like piping.

How It Works

Botflow implements dataflow programming by treating functions as nodes connected by pipes. Data flows through these pipes, triggering function execution. This approach decouples data from functionality, promoting reusability. Key components include Pipe for defining sequential steps and Route for creating complex, nested data flow networks. Parallelism is managed internally using asyncio coroutines and ThreadPools, abstracting away much of the complexity for the user.

Quick Start & Requirements

Highlighted Details

  • Supports interactive programming within Jupyter Notebooks.
  • Offers a "replay mode" to resume pipelines from the nearest completed step after an exception, significantly speeding up development.
  • Can render data flow graphs using Graphviz.
  • Claims to be "10x faster than Scrapy" for web crawling tasks.

Maintenance & Community

  • Current release: 0.2.0 alpha.
  • Recent milestones include Jupyter support and nested pipes.
  • Future roadmap includes HTTPServer support and online ML model serving.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

  • The project is currently in alpha status (0.2.0 alpha), indicating potential instability or ongoing changes.
  • The lack of an explicit license may pose compatibility issues for commercial or closed-source projects.
Health Check
Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Yang Song Yang Song(Professor at Caltech; Research Scientist at OpenAI), Jeremy Howard Jeremy Howard(Cofounder of fast.ai), and
6 more.

PiPPy by pytorch

0%
779
PyTorch tool for pipeline parallelism
Created 3 years ago
Updated 1 year ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
26 more.

datasets by huggingface

0.1%
21k
Access and process large AI datasets efficiently
Created 5 years ago
Updated 1 day ago
Feedback? Help us improve.