Discover and explore top open-source AI tools and projects—updated daily.
thu-coaiScaling context windows via visual-text compression
Top 58.0% on SourcePulse
Glyph addresses the challenge of scaling context windows in Large Language Models (LLMs) by transforming long textual sequences into images, which are then processed by Vision-Language Models (VLMs). This approach targets researchers and practitioners dealing with extensive documents or conversations, offering significant reductions in computational and memory costs while preserving semantic information, thereby enabling more efficient long-context processing.
How It Works
Glyph reframes long-context modeling as a multimodal problem. Instead of directly processing lengthy text inputs, it renders text into compact images. These images are then fed into VLMs, leveraging their inherent ability to process visual information. This paradigm shift allows for substantial input-token compression, leading to considerable savings in computational resources and inference time compared to conventional text-only LLMs operating on extended contexts.
Quick Start & Requirements
apt-get install poppler-utils and pip install transformers==4.57.1. Optional dependencies include vllm==0.10.2 and sglang==0.5.2 for accelerated inference.poppler-utils. The core model is based on GLM-4.1V-9B-Base. GPU acceleration is recommended for VLM inference.demo/run_demo_compared.sh) allows side-by-side comparison of Glyph with a baseline text model.Highlighted Details
Maintenance & Community
The project is the official repository for the Glyph paper, with the fine-tuned model publicly available on Hugging Face. No specific community channels (e.g., Discord, Slack) or details on ongoing maintenance, sponsorships, or partnerships are provided in the README.
Licensing & Compatibility
The README does not explicitly state the software license. This absence is a significant factor for potential adopters, as it leaves commercial use, distribution, and derivative works undefined.
Limitations & Caveats
Glyph's performance is sensitive to rendering parameters (resolution, font, spacing), potentially limiting generalization to unseen rendering styles. OCR-related challenges persist, particularly with fine-grained or rare alphanumeric strings in ultra-long inputs. The model's generalization capabilities beyond long-context understanding require further study. The current text rendering implementation using reportlab has room for performance acceleration.
2 months ago
Inactive
microsoft
deepseek-ai
InternLM
QwenLM
lucidrains
deepseek-ai