Discover and explore top open-source AI tools and projects—updated daily.
HeartuneGenerate exercise sets from textbooks
Top 87.7% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> DataFlow-EDU addresses the challenge of automatically generating high-quality, structured educational question banks and benchmarks from PDF textbooks. It targets educators and researchers by providing an end-to-end, operator-based pipeline that transforms raw teaching materials into usable datasets for training and evaluating large language models. The primary benefit is the automation of a labor-intensive process, enabling scalable creation of domain-specific educational content.
How It Works
The project employs a DataFlow and PyTorch-inspired, operator-and-pipeline architecture for semi-automatic educational dataset generation. It ingests PDF documents, utilizing MinerU for multimodal document parsing and OCR. Subsequent operators handle content slicing, question generation, dynamic question type balancing, and multi-stage cleaning (ambiguity and domain refinement). LLM-as-a-Judge is integrated for multi-dimensional quality review, ensuring low hallucination and balanced distribution. The pipeline is designed to be modular and command-line interactive, allowing user monitoring and intervention, with a Vue.js-based WebUI for enhanced management.
Quick Start & Requirements
pip install -e . in the DataFlow directory. Run pipeline via python -m dataflow_edu.edu_data_pipeline in the project root..llm_config.json), Node.js for WebUI.slide-deck/dataflow-edu/. WebUI default access: http://127.0.0.1:5173 (local).Highlighted Details
Maintenance & Community
The README does not provide specific details on maintainers, community channels (like Discord/Slack), sponsorships, or a public roadmap.
Licensing & Compatibility
The README does not explicitly state the project's license.
Limitations & Caveats
The system is semi-automatic, requiring human monitoring and intervention. Development is ongoing, with planned improvements including direct PDF parsing (instead of image conversion) and enhanced WebUI features like drag-and-drop controls and real-time progress previews. The project is actively being developed, indicated by "TODO" items in the README.
2 weeks ago
Inactive
EleutherAI
HandsOnLLM
fighting41love