knowledge by raphaelsty

Bookmarks search engine with knowledge graph

Created 3 years ago

726 stars

Top 47.4% on SourcePulse

Project Summary

This project provides an open-source personal bookmark search engine that automatically extracts and indexes content from social media platforms like GitHub, HackerNews, and Twitter, along with Zotero documents. It aims to create a navigable knowledge graph of your saved content, enhancing searchability and discovery for users managing extensive digital collections.

How It Works

The system utilizes a GitHub Actions workflow that runs twice daily to fetch starred repositories, liked tweets, upvoted HackerNews posts, and Zotero records. Extracted data is stored in JSON files (database.json for records, triples.json for the knowledge graph) and a search index (retriever.pkl). The application is deployed via Fly.io, with an updated version automatically pushed after data extraction. OpenAI's API can be optionally used for re-ranking search results.

Quick Start & Requirements

Install/Run: Deploy via Fly.io using flyctl auth login and fly deploy. Local development with Docker: export OPENAI_API_KEY="..." then make launch.
Prerequisites: flyctl client, OpenAI API key (optional), Twitter API token, Zotero API key and library ID. GitHub and Twitter user handles for data sources.
Resources: Fly.io hosting costs under $8/month with limits. A 2GB memory VM with a single shared CPU is recommended for Fly.io.
Docs: Fly.io Documentation, Zotero API, OpenAI API.

Highlighted Details

Automated daily data extraction from GitHub, Twitter, HackerNews, and Zotero.
Generates a knowledge graph for navigating tagged and extracted content.
Optional AI-powered re-ranking of search results via OpenAI.
Deployment and hosting managed through Fly.io for continuous updates.

Maintenance & Community

The project is maintained by Raphaël Sourty. No specific community channels or roadmap links are provided in the README.

Licensing & Compatibility

License: GNU GENERAL PUBLIC LICENSE.
Compatibility: GPL is a copyleft license, potentially requiring derivative works to also be open-sourced if linked. Commercial use or integration into closed-source projects may require careful consideration of license obligations.

Limitations & Caveats

The README indicates that the GitHub Pages URLs need manual updating after API deployment. The project is inspired by Semanlink, suggesting potential overlap in functionality or design.

knowledge by raphaelsty

Explore Similar Projects

scrape-it-now by clemlesne

leettools by leettools-dev

FLARE by jzbjyb

smaug by alexknowshtml

web-explorer by langchain-ai

paperlib by Future-Scholars

bmm by Y80

tavily-python by tavily-ai

huntly by lcomplete

karakeep by karakeep-app

SurfSense by MODSetter

wiseflow by TeamWiseFlow