second-brain  by henrydaum

Desktop RAG app with multimodal AI and hybrid search

Created 2 months ago
360 stars

Top 77.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Second Brain is a desktop personal knowledge base application using Retrieval-Augmented Generation (RAG) and multimodal AI. It provides a private, hybrid lexical/semantic search engine for local text files and images, enabling intelligent interaction with user data without cloud reliance.

How It Works

The system combines semantic search (vector embeddings) with keyword matching (BM25) for hybrid retrieval. It processes local files, generating embeddings for text chunks and images stored in ChromaDB, with image captions aiding lexical search. Retrieval involves combined search, MMR reranking, and optional AI filtering. The frontend offers a chat-like UI, allowing users to attach results for continuous, filebase-driven searches—a novel interaction.

Quick Start & Requirements

Manual installation requires downloading core Python files (.py, .json, .csv). Prerequisites include Python 3.9+, LM Studio (vision models recommended) or an OpenAI API key, and installing dependencies via pip (e.g., chromadb, sentence-transformers). GPU/CPU support is available; default models use ~2GB VRAM/RAM. Google Drive syncing needs credentials.json. Initial model downloads and large directory syncing can be time-consuming.

Highlighted Details

  • Multimodal RAG: Embeds and searches text/images, supporting AI vision.
  • Hybrid Search: Integrates semantic and lexical search, reranked for relevance/diversity.
  • Local-First Privacy: All data processing and search operations are local.
  • Interactive Search: Attach search results for continuous, filebase-driven follow-up queries.
  • Optional AI Integration: Connects to local (LM Studio) or cloud (OpenAI) LLMs for enhanced queries and insights.

Maintenance & Community

Maintained by henrydaum. No specific community channels, contributor lists, or sponsorship details are provided.

Licensing & Compatibility

The repository is open source. However, the specific license type is not explicitly stated, hindering assessment for commercial use or closed-source linking.

Limitations & Caveats

An official installer is pending. Google Drive authentication can be unstable. Changing embedding models requires re-indexing. Initial syncing is time-intensive, and AI filtering may affect performance. The lack of a clear license is a significant adoption blocker.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
4
Star History
308 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Chenlin Meng Chenlin Meng(Cofounder of Pika), and
9 more.

clip-retrieval by rom1504

0.1%
3k
CLIP retrieval system for semantic search
Created 4 years ago
Updated 3 months ago
Feedback? Help us improve.