yams  by trvon

Content-addressable storage for LLMs and applications

Created 3 months ago
352 stars

Top 79.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

YAMS is a content-addressable storage system designed for LLMs and applications, offering deduplication, full-text, and semantic search capabilities. It targets developers and researchers needing persistent, versioned, and easily searchable data storage, providing efficient data integrity and retrieval.

How It Works

YAMS utilizes SHA-256 hashing for content addressing, ensuring data integrity and immutability. Block-level deduplication is achieved via Rabin fingerprinting. It supports both full-text search using SQLite FTS5 and semantic search through vector embeddings. Crash recovery is managed with a write-ahead logging system, and the architecture is thread-safe, enabling high performance with reported throughputs exceeding 100MB/s.

Quick Start & Requirements

  • Installation: Docker (docker run --rm -it ghcr.io/trvon/yams:latest --version) or build from source using Conan (recommended).
  • Prerequisites: C++20 compiler (GCC 11+, Clang 14+), CMake 3.20+, Python 3.8+ (for Conan). macOS: brew install openssl@3 protobuf sqlite3 ncurses. Linux: apt install libssl-dev libsqlite3-dev protobuf-compiler libncurses-dev.
  • Setup: Initialize storage with yams init --non-interactive.
  • Docs: LLM Integration Guide, CLI Usage Examples

Highlighted Details

  • Content-addressed storage with SHA-256 hashing.
  • Block-level deduplication using Rabin fingerprinting.
  • Combined full-text (SQLite FTS5) and semantic search.
  • Write-ahead logging for crash recovery.
  • High performance: 100MB/s+ throughput.
  • Optional PDF text extraction support.

Maintenance & Community

The project is actively maintained by trvon. Community channels are not explicitly mentioned in the README.

Licensing & Compatibility

Licensed under Apache-2.0, which permits commercial use and linking with closed-source projects.

Limitations & Caveats

Traditional CMake builds (without Conan) are noted to have dependency resolution issues; Conan builds are recommended. PDF extraction may fail if PDFium download is blocked by firewalls. Retrieval by name is listed as "coming soon."

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
5 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.