yams  by trvon

Content-addressable storage for LLMs and applications

Created 1 month ago
340 stars

Top 81.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

YAMS is a content-addressable storage system designed for LLMs and applications, offering deduplication, full-text, and semantic search capabilities. It targets developers and researchers needing persistent, versioned, and easily searchable data storage, providing efficient data integrity and retrieval.

How It Works

YAMS utilizes SHA-256 hashing for content addressing, ensuring data integrity and immutability. Block-level deduplication is achieved via Rabin fingerprinting. It supports both full-text search using SQLite FTS5 and semantic search through vector embeddings. Crash recovery is managed with a write-ahead logging system, and the architecture is thread-safe, enabling high performance with reported throughputs exceeding 100MB/s.

Quick Start & Requirements

  • Installation: Docker (docker run --rm -it ghcr.io/trvon/yams:latest --version) or build from source using Conan (recommended).
  • Prerequisites: C++20 compiler (GCC 11+, Clang 14+), CMake 3.20+, Python 3.8+ (for Conan). macOS: brew install openssl@3 protobuf sqlite3 ncurses. Linux: apt install libssl-dev libsqlite3-dev protobuf-compiler libncurses-dev.
  • Setup: Initialize storage with yams init --non-interactive.
  • Docs: LLM Integration Guide, CLI Usage Examples

Highlighted Details

  • Content-addressed storage with SHA-256 hashing.
  • Block-level deduplication using Rabin fingerprinting.
  • Combined full-text (SQLite FTS5) and semantic search.
  • Write-ahead logging for crash recovery.
  • High performance: 100MB/s+ throughput.
  • Optional PDF text extraction support.

Maintenance & Community

The project is actively maintained by trvon. Community channels are not explicitly mentioned in the README.

Licensing & Compatibility

Licensed under Apache-2.0, which permits commercial use and linking with closed-source projects.

Limitations & Caveats

Traditional CMake builds (without Conan) are noted to have dependency resolution issues; Conan builds are recommended. PDF extraction may fail if PDFium download is blocked by firewalls. Retrieval by name is listed as "coming soon."

Health Check
Last Commit

18 hours ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
3
Star History
342 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.