yobulkdev  by yobulkdev

Open source data onboarding platform for businesses using CSV files

Created 2 years ago
902 stars

Top 40.2% on SourcePulse

GitHubView on GitHub
Project Summary

YoBulk is an open-source, AI-driven data onboarding platform designed to streamline CSV data import for businesses. It offers a no-code solution for creating import buttons, smart column matching, custom validation rules, and a review interface, aiming to be a free alternative to commercial solutions like Flatfile.com.

How It Works

YoBulk is a full-stack Next.js application utilizing MongoDB for data storage. It processes CSV files, offering features like smart auto-matching between CSV columns and template columns, custom validation rules (including regex), and streaming capabilities for large files up to 1GB. Its AI integration, powered by OpenAI, provides auto-suggestions for error correction and aims to build a knowledge graph for data mapping decisions.

Quick Start & Requirements

  • Docker Compose: git clone https://github.com/yobulkdev/yobulkdev.git && cd yobulkdev && docker-compose up -d (Requires OpenAI API key for AI features).
  • Docker Run: docker run --rm -it -p 5050:5050/tcp --env="OPENAI_SECRET_KEY=****" yobulk/yobulk (Requires local MongoDB instance).
  • Local Build: git clone https://github.com/yobulkdev/yobulkdev && cd yobulkdev && yarn install && yarn run dev (Requires local MongoDB instance and OpenAI API key in .env).
  • Prerequisites: MongoDB, Node.js (for local build), Docker.
  • Documentation: https://github.com/yobulkdev/yobulkdev

Highlighted Details

  • AI-driven features including GPT3 integration for auto-suggestion and error correction.
  • Scalable to import CSV files up to 1GB via streaming.
  • No-code template creation and smart column matching.
  • Supports custom validation rules, including regex.

Maintenance & Community

Licensing & Compatibility

  • AGPL 3.0 license. This is a strong copyleft license, requiring derivative works to also be open-sourced under AGPL 3.0.

Limitations & Caveats

The project is actively developing, with features like custom LLM models and data mapping knowledge graphs listed as "Coming Soon." The README explicitly states it does not claim to outperform Flatfile.com in functionality or design at present.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
11 more.

datatrove by huggingface

0.9%
3k
Data processing library for large-scale text data
Created 2 years ago
Updated 2 days ago
Feedback? Help us improve.