Qmedia  by QmiAI

Open-source AI content search engine for content creators

created 1 year ago
565 stars

Top 57.8% on sourcepulse

GitHubView on GitHub
Project Summary

QMedia is an open-source AI content search engine tailored for content creators, enabling efficient extraction and querying of information from text, images, and short videos. It facilitates local deployment of a web app, RAG server, and LLM server, offering multimodal RAG capabilities for private data Q&A.

How It Works

QMedia employs a modular architecture with three core services: mm_server for multimodal model inference (LLMs, image/video analysis, embeddings), mmrag_server for content extraction, embedding, storage, and RAG-based Q&A, and qmedia_web as a Next.js-based frontend. This separation allows flexible deployment and integration, with Python/LlamaIndex powering the backend services and TypeScript/Next.js for the frontend.

Quick Start & Requirements

  • Install/Run: Requires separate setup for mm_server, mmrag_server, and qmedia_web. Example commands involve cd into directories and running Python scripts (main.py) or pnpm dev.
  • Prerequisites: Python environment with specific dependencies (e.g., qllm, qmedia conda environments), Node.js/pnpm for the web frontend. Local Ollama models (e.g., llama3:8b-instruct) are supported.
  • Resources: Local deployment of models may require significant GPU resources.
  • Docs: Installation Instructions, Usage

Highlighted Details

  • Supports multimodal RAG for text, image, and short video content.
  • Features "Content Cards" for displaying extracted information, inspired by XHS.
  • Offers pure local deployment of various models, including CLIP for image embedding, BGE for text embedding, Faster Whisper for video transcription, and LLaVA for visual understanding.
  • Allows independent use of model services via API.

Maintenance & Community

  • Active development indicated by changelog and issue tracking.
  • Community channels include Discord and WeChat groups.
  • Twitter handle: @Lafe8088.

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

The project is structured into multiple services, requiring separate setup and management. While local deployment is emphasized, specific hardware requirements for running multimodal models locally are not detailed. Future plans include viral content breakdown and similarity search.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 90 days

Explore Similar Projects

Starred by Addy Osmani Addy Osmani(Engineering Leader on Google Chrome), Victor Taelin Victor Taelin(Author of Bend, Kind, HVM), and
1 more.

chatbox by chatboxai

0.3%
36k
Desktop client app for AI models/LLMs
created 2 years ago
updated 6 days ago
Feedback? Help us improve.