yobulkdev  by yobulkdev

Open source data onboarding platform for businesses using CSV files

created 2 years ago
897 stars

Top 41.3% on sourcepulse

GitHubView on GitHub
Project Summary

YoBulk is an open-source, AI-driven data onboarding platform designed to streamline CSV data import for businesses. It offers a no-code solution for creating import buttons, smart column matching, custom validation rules, and a review interface, aiming to be a free alternative to commercial solutions like Flatfile.com.

How It Works

YoBulk is a full-stack Next.js application utilizing MongoDB for data storage. It processes CSV files, offering features like smart auto-matching between CSV columns and template columns, custom validation rules (including regex), and streaming capabilities for large files up to 1GB. Its AI integration, powered by OpenAI, provides auto-suggestions for error correction and aims to build a knowledge graph for data mapping decisions.

Quick Start & Requirements

  • Docker Compose: git clone https://github.com/yobulkdev/yobulkdev.git && cd yobulkdev && docker-compose up -d (Requires OpenAI API key for AI features).
  • Docker Run: docker run --rm -it -p 5050:5050/tcp --env="OPENAI_SECRET_KEY=****" yobulk/yobulk (Requires local MongoDB instance).
  • Local Build: git clone https://github.com/yobulkdev/yobulkdev && cd yobulkdev && yarn install && yarn run dev (Requires local MongoDB instance and OpenAI API key in .env).
  • Prerequisites: MongoDB, Node.js (for local build), Docker.
  • Documentation: https://github.com/yobulkdev/yobulkdev

Highlighted Details

  • AI-driven features including GPT3 integration for auto-suggestion and error correction.
  • Scalable to import CSV files up to 1GB via streaming.
  • No-code template creation and smart column matching.
  • Supports custom validation rules, including regex.

Maintenance & Community

Licensing & Compatibility

  • AGPL 3.0 license. This is a strong copyleft license, requiring derivative works to also be open-sourced under AGPL 3.0.

Limitations & Caveats

The project is actively developing, with features like custom LLM models and data mapping knowledge graphs listed as "Coming Soon." The README explicitly states it does not claim to outperform Flatfile.com in functionality or design at present.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

NeumAI by NeumTry

0%
858
Data platform for retrieval-augmented generation (RAG)
created 1 year ago
updated 1 year ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

argilla by argilla-io

0.4%
5k
Collaboration tool for building high-quality AI datasets
created 4 years ago
updated 5 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Travis Fischer Travis Fischer(Founder of Agentic).

teable by teableio

0.3%
19k
No-code Postgres alternative for database applications
created 2 years ago
updated 15 hours ago
Feedback? Help us improve.