botflow  by kkyon

Dataflow framework for data pipelines (web crawling, ML, quant trading)

created 7 years ago
1,201 stars

Top 33.3% on sourcepulse

GitHubView on GitHub
Project Summary

Botflow is a Python framework for building dataflow pipelines, targeting applications like web crawling, machine learning, and quantitative trading. It simplifies the creation of complex data processing workflows by connecting functions via pipes (queues) and enabling parallel execution through coroutines and ThreadPools, making it suitable for developers familiar with Unix-like piping.

How It Works

Botflow implements dataflow programming by treating functions as nodes connected by pipes. Data flows through these pipes, triggering function execution. This approach decouples data from functionality, promoting reusability. Key components include Pipe for defining sequential steps and Route for creating complex, nested data flow networks. Parallelism is managed internally using asyncio coroutines and ThreadPools, abstracting away much of the complexity for the user.

Quick Start & Requirements

Highlighted Details

  • Supports interactive programming within Jupyter Notebooks.
  • Offers a "replay mode" to resume pipelines from the nearest completed step after an exception, significantly speeding up development.
  • Can render data flow graphs using Graphviz.
  • Claims to be "10x faster than Scrapy" for web crawling tasks.

Maintenance & Community

  • Current release: 0.2.0 alpha.
  • Recent milestones include Jupyter support and nested pipes.
  • Future roadmap includes HTTPServer support and online ML model serving.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

  • The project is currently in alpha status (0.2.0 alpha), indicating potential instability or ongoing changes.
  • The lack of an explicit license may pose compatibility issues for commercial or closed-source projects.
Health Check
Last commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Yang Song Yang Song(Professor at Caltech; Research Scientist at OpenAI), Jeremy Howard Jeremy Howard(Cofounder of fast.ai), and
4 more.

PiPPy by pytorch

0.1%
775
PyTorch tool for pipeline parallelism
created 3 years ago
updated 11 months ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

bytewax by bytewax

0.3%
2k
Python framework for stateful stream processing
created 3 years ago
updated 4 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

towhee by towhee-io

0.2%
3k
Framework for neural data processing pipelines
created 4 years ago
updated 9 months ago
Feedback? Help us improve.