knowledge  by raphaelsty

Bookmarks search engine with knowledge graph

created 2 years ago
674 stars

Top 51.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an open-source personal bookmark search engine that automatically extracts and indexes content from social media platforms like GitHub, HackerNews, and Twitter, along with Zotero documents. It aims to create a navigable knowledge graph of your saved content, enhancing searchability and discovery for users managing extensive digital collections.

How It Works

The system utilizes a GitHub Actions workflow that runs twice daily to fetch starred repositories, liked tweets, upvoted HackerNews posts, and Zotero records. Extracted data is stored in JSON files (database.json for records, triples.json for the knowledge graph) and a search index (retriever.pkl). The application is deployed via Fly.io, with an updated version automatically pushed after data extraction. OpenAI's API can be optionally used for re-ranking search results.

Quick Start & Requirements

  • Install/Run: Deploy via Fly.io using flyctl auth login and fly deploy. Local development with Docker: export OPENAI_API_KEY="..." then make launch.
  • Prerequisites: flyctl client, OpenAI API key (optional), Twitter API token, Zotero API key and library ID. GitHub and Twitter user handles for data sources.
  • Resources: Fly.io hosting costs under $8/month with limits. A 2GB memory VM with a single shared CPU is recommended for Fly.io.
  • Docs: Fly.io Documentation, Zotero API, OpenAI API.

Highlighted Details

  • Automated daily data extraction from GitHub, Twitter, HackerNews, and Zotero.
  • Generates a knowledge graph for navigating tagged and extracted content.
  • Optional AI-powered re-ranking of search results via OpenAI.
  • Deployment and hosting managed through Fly.io for continuous updates.

Maintenance & Community

The project is maintained by Raphaël Sourty. No specific community channels or roadmap links are provided in the README.

Licensing & Compatibility

  • License: GNU GENERAL PUBLIC LICENSE.
  • Compatibility: GPL is a copyleft license, potentially requiring derivative works to also be open-sourced if linked. Commercial use or integration into closed-source projects may require careful consideration of license obligations.

Limitations & Caveats

The README indicates that the GitHub Pages URLs need manual updating after API deployment. The project is inspired by Semanlink, suggesting potential overlap in functionality or design.

Health Check
Last commit

10 hours ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
57 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
3 more.

Perplexica by ItzCrazyKns

0.3%
23k
AI-powered search engine alternative
created 1 year ago
updated 1 day ago
Feedback? Help us improve.