SurfSense  by MODSetter

Open-source tool for personal knowledge base research

created 1 year ago
6,197 stars

Top 8.5% on sourcepulse

GitHubView on GitHub
Project Summary

SurfSense is an open-source, self-hostable AI research agent designed as an alternative to tools like NotebookLM and Perplexity. It allows users to connect to personal knowledge bases, external data sources like search engines and productivity tools, and supports local LLMs for enhanced privacy and customization.

How It Works

SurfSense employs advanced Retrieval-Augmented Generation (RAG) techniques, including a two-tiered hierarchical index setup and hybrid search combining semantic and full-text search with Reciprocal Rank Fusion (RRF). It supports over 150 LLMs and 6000 embedding models, utilizing a FastAPI backend with PostgreSQL and pgvector for efficient vector storage and retrieval. The architecture is designed for extensibility, with a RAG-as-a-Service API backend.

Quick Start & Requirements

  • Installation: Docker (less customization) or Manual Installation (recommended for control). Detailed OS-specific guides are provided.
  • Prerequisites: PGVector setup, Google OAuth configuration, Unstructured.io API key, and other required API keys.
  • Resources: Requires PostgreSQL with pgvector. Specific hardware requirements are not detailed but imply a need for sufficient resources for LLM inference and database operations.
  • Links: Video

Highlighted Details

  • Connects to a wide array of external sources: Search Engines (Tavily, LinkUp), Slack, Linear, Notion, YouTube, GitHub.
  • Supports uploading content from 27 file extensions, including documents and images.
  • Offers privacy and local LLM support via Ollama.
  • Features a cross-browser extension for saving webpages, including authenticated content.

Maintenance & Community

SurfSense is actively under development, with a stated goal of becoming production-ready. Users are encouraged to contribute via Discord to shape its future.

Licensing & Compatibility

The project does not explicitly state a license in the provided README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

The project is not yet production-ready, and the podcast feature is temporarily deprecated. While local LLM support is implemented, comprehensive compatibility details for all local models are not fully elaborated.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
50
Issues (30d)
25
Star History
4,064 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
3 more.

Perplexica by ItzCrazyKns

0.3%
23k
AI-powered search engine alternative
created 1 year ago
updated 1 day ago
Feedback? Help us improve.