xiaoyaosearch  by dtsola

AI-powered local file search for multimodal queries

Created 2 months ago
294 stars

Top 90.2% on SourcePulse

GitHubView on GitHub
Project Summary

A cross-platform desktop application, XiaoyaoSearch addresses the challenge of finding local files by enabling intelligent, AI-powered search through natural language, voice, and images. Targeted at knowledge workers, content creators, and developers, it transforms local file discovery into a conversational experience, significantly enhancing productivity.

How It Works

XiaoyaoSearch employs a hybrid architecture combining Faiss for vector search and Whoosh for full-text indexing. It integrates advanced AI models, including BGE-M3 for embeddings, FasterWhisper for speech recognition, CN-CLIP for image understanding, and Ollama for language processing. User inputs are semantically analyzed by these models, allowing for deep content and filename searches across diverse file types, offering a more intuitive and powerful retrieval mechanism than traditional file explorers.

Quick Start & Requirements

  • Operating Systems: Windows, macOS, Linux
  • Primary Dependencies: Python 3.10.11+, Node.js 21.x+, ffmpeg, Ollama.
  • Hardware: 8GB+ RAM recommended. Optional CUDA 12.1+ for GPU acceleration.
  • Installation: Clone the repository (git clone), install backend dependencies (pip install -r requirements.txt), install frontend dependencies (npm install).
  • Configuration: Requires setting up a .env file for paths and API settings.
  • Models: Manual download and placement of specific embedding, speech, and vision models are necessary. Default Ollama model is qwen2.5:1.5b.
  • Running: Backend: python main.py or uvicorn main:app --host 127.0.0.1 --port 8000 --reload. Frontend: npm run dev.
  • Resources: Links to product roadmap, API documentation, and UI prototypes are available within the docs/ directory.

Highlighted Details

  • Multi-modal Input: Accepts voice recordings, text queries, and image uploads for search.
  • Deep Content Search: Indexes content and filenames for video (mp4, avi), audio (mp3, wav), and documents (txt, markdown, office, pdf).
  • AI Model Integration: Utilizes BGE-M3, FasterWhisper, CN-CLIP, and Ollama for robust AI capabilities.
  • Hybrid Search: Combines vector and full-text search for high-performance retrieval.
  • Privacy-Focused: Operates entirely locally with no data uploaded to the cloud; includes a privacy mode.

Maintenance & Community

The project is developed by dtsola, an IT Solutions Architect. Community interaction is primarily facilitated via WeChat. A product roadmap is available for project direction.

Licensing & Compatibility

The software is free for non-commercial use, allowing modification and distribution with copyright notices. Commercial use requires explicit authorization under the "小遥搜索软件授权协议". Compatibility for linking with closed-source commercial applications is not specified and likely restricted without a commercial license.

Limitations & Caveats

Commercial use is strictly prohibited without obtaining a separate license. The setup process involves manual downloading and configuration of AI models and potentially complex dependency management (e.g., CUDA), which may pose a barrier for less technical users.

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
273 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Chenlin Meng Chenlin Meng(Cofounder of Pika), and
9 more.

clip-retrieval by rom1504

0.1%
3k
CLIP retrieval system for semantic search
Created 4 years ago
Updated 5 months ago
Starred by Chang She Chang She(Cofounder of LanceDB), Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), and
11 more.

lancedb by lancedb

0.9%
9k
Embedded retrieval engine for multimodal AI
Created 2 years ago
Updated 4 days ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

RAG-Anything by HKUDS

1.1%
12k
All-in-one multimodal RAG system
Created 7 months ago
Updated 2 weeks ago
Feedback? Help us improve.