Glyph  by thu-coai

Scaling context windows via visual-text compression

Created 3 months ago
553 stars

Top 58.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Glyph addresses the challenge of scaling context windows in Large Language Models (LLMs) by transforming long textual sequences into images, which are then processed by Vision-Language Models (VLMs). This approach targets researchers and practitioners dealing with extensive documents or conversations, offering significant reductions in computational and memory costs while preserving semantic information, thereby enabling more efficient long-context processing.

How It Works

Glyph reframes long-context modeling as a multimodal problem. Instead of directly processing lengthy text inputs, it renders text into compact images. These images are then fed into VLMs, leveraging their inherent ability to process visual information. This paradigm shift allows for substantial input-token compression, leading to considerable savings in computational resources and inference time compared to conventional text-only LLMs operating on extended contexts.

Quick Start & Requirements

Highlighted Details

  • Achieves competitive performance on benchmarks like LongBench and MRCR.
  • Offers significant compression ratios (3-4x with DPI=72) and inference speedups over text backbones.
  • Supports vLLM acceleration for enhanced throughput and response speed.
  • Customizable rendering configurations (DPI, newline markup) allow tuning for compression and performance.

Maintenance & Community

The project is the official repository for the Glyph paper, with the fine-tuned model publicly available on Hugging Face. No specific community channels (e.g., Discord, Slack) or details on ongoing maintenance, sponsorships, or partnerships are provided in the README.

Licensing & Compatibility

The README does not explicitly state the software license. This absence is a significant factor for potential adopters, as it leaves commercial use, distribution, and derivative works undefined.

Limitations & Caveats

Glyph's performance is sensitive to rendering parameters (resolution, font, spacing), potentially limiting generalization to unseen rendering styles. OCR-related challenges persist, particularly with fine-grained or rare alphanumeric strings in ultra-long inputs. The model's generalization capabilities beyond long-context understanding require further study. The current text rendering implementation using reportlab has room for performance acceleration.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

DeepSeek-V3.2-Exp by deepseek-ai

1.2%
1k
Experimental LLM boosting long-context efficiency
Created 4 months ago
Updated 2 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

x-transformers by lucidrains

0.1%
6k
Transformer library with extensive experimental features
Created 5 years ago
Updated 2 weeks ago
Feedback? Help us improve.