flipbook-app by imcuttle

Multimodal canvas for dynamic knowledge discovery

Created 3 weeks ago

New!

313 stars

Top 86.0% on SourcePulse

Project Summary

This project addresses the creation of interactive, explorable knowledge visualizations by transforming static images into dynamic "flipbooks." It targets engineers, researchers, and power users seeking novel ways to interact with and understand complex information through an AI-powered, click-to-explore interface. The primary benefit is an engaging, encyclopedia-like experience that allows users to progressively drill down into topics, generating annotated diagrams on demand.

How It Works

Flipbook Canvas employs a pluggable multimodal pipeline that orchestrates LLM-based planning, image generation, web search, and OCR. Users initiate exploration by long-pressing an image region. The system infers the subject, optionally augments understanding with web search results, and then generates a new, annotated child diagram. This process creates an infinite, shareable tree of interconnected knowledge canvases, with each node featuring detailed captions and OCR'd text labels.

Quick Start & Requirements

Primary install/run commands: npm install followed by npm run dev for development. Enable the reference CLI provider with ENABLE_CODEBUDDY=1 npm run dev:server.
Prerequisites: Node.js, npm. Full functionality requires API keys for various LLM and image generation providers (e.g., OpenAI, Gemini, Seedream). Apple Vision (local) is used for OCR, and Microsoft Edge neural voices for TTS.
Resource footprint: Node generation takes approximately 70-95 seconds with the reference provider, producing 2752x1536 PNG images (~6MB).
Links: Live examples are available at https://imcuttle.github.io/flipbook-app.

Highlighted Details

Click-to-explore: Interactive image regions dynamically generate new, contextually relevant child diagrams.
Live-streaming Generation: Node generation progress is streamed via SSE, and nodes are persisted and linkable immediately, allowing real-time collaboration and replay.
Selectable In-Image Text: Generated image labels are overlaid with selectable HTML text via OCR, enabling easy copying of information.
Web-Search Augmented: An LLM gate determines the utility of web search for enriching node context before generation.
Pluggable Multimodal Pipeline: Designed for extensibility, allowing integration of custom LLM, image generation, and web search models.
Voice Narration: Integrated text-to-speech using free Microsoft Edge neural voices, bundled into exports for offline playback.

Maintenance & Community

The project is hosted on GitHub at https://github.com/imcuttle/flipbook-app. Specific details regarding active contributors, community channels (like Discord/Slack), or a public roadmap are not detailed in the provided README.

Licensing & Compatibility

The license type and any associated compatibility notes for commercial use or closed-source linking are not specified in the provided README content.

Limitations & Caveats

Several multimodal providers are listed as stubs, requiring user implementation or API key configuration for full functionality. The default setup runs in a limited "stub mode" with SVG placeholders, necessitating explicit configuration (e.g., ENABLE_CODEBUDDY=1) to enable core AI-driven generation and search capabilities. The reliance on external LLM/image APIs means costs and availability are dependent on third-party services.

flipbook-app by imcuttle

Explore Similar Projects

gpt-image-2-skill by UzenUPozitiv4ik

openjourney by ammaarreshi

MM-REACT by microsoft

UForm by unum-cloud

archify by tt-a1i

gpt-image-canvas by mrslimslim

cli by MiniMax-AI

InternGPT by OpenGVLab

SenseNova-U1 by OpenSenseNova

PixelRAG by StarTrail-org

TAICHI-flet by moshstudio

refly by refly-ai