SurfSense  by MODSetter

Open-source tool for personal knowledge base research

Created 1 year ago
7,850 stars

Top 6.6% on SourcePulse

GitHubView on GitHub
Project Summary

SurfSense is an open-source, self-hostable AI research agent designed as an alternative to tools like NotebookLM and Perplexity. It allows users to connect to personal knowledge bases, external data sources like search engines and productivity tools, and supports local LLMs for enhanced privacy and customization.

How It Works

SurfSense employs advanced Retrieval-Augmented Generation (RAG) techniques, including a two-tiered hierarchical index setup and hybrid search combining semantic and full-text search with Reciprocal Rank Fusion (RRF). It supports over 150 LLMs and 6000 embedding models, utilizing a FastAPI backend with PostgreSQL and pgvector for efficient vector storage and retrieval. The architecture is designed for extensibility, with a RAG-as-a-Service API backend.

Quick Start & Requirements

  • Installation: Docker (less customization) or Manual Installation (recommended for control). Detailed OS-specific guides are provided.
  • Prerequisites: PGVector setup, Google OAuth configuration, Unstructured.io API key, and other required API keys.
  • Resources: Requires PostgreSQL with pgvector. Specific hardware requirements are not detailed but imply a need for sufficient resources for LLM inference and database operations.
  • Links: Video

Highlighted Details

  • Connects to a wide array of external sources: Search Engines (Tavily, LinkUp), Slack, Linear, Notion, YouTube, GitHub.
  • Supports uploading content from 27 file extensions, including documents and images.
  • Offers privacy and local LLM support via Ollama.
  • Features a cross-browser extension for saving webpages, including authenticated content.

Maintenance & Community

SurfSense is actively under development, with a stated goal of becoming production-ready. Users are encouraged to contribute via Discord to shape its future.

Licensing & Compatibility

The project does not explicitly state a license in the provided README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

The project is not yet production-ready, and the podcast feature is temporarily deprecated. While local LLM support is implemented, comprehensive compatibility details for all local models are not fully elaborated.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
24
Issues (30d)
26
Star History
1,461 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Taranjeet Singh Taranjeet Singh(Cofounder of Mem0), and
8 more.

Perplexica by ItzCrazyKns

5.7%
25k
AI-powered search engine alternative
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.