gguf-parser-go  by gpustack

Analyze GGUF models and estimate inference resources

Created 1 year ago
257 stars

Top 98.2% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This project provides a Go-based utility for inspecting GGUF model files without requiring full downloads. It offers precise estimations of memory usage and maximum tokens per second (TPS), enabling users to quickly evaluate and plan for model deployment. The tool is designed for ML engineers and researchers working with large language models in the GGUF format.

How It Works

The gguf-parser-go leverages chunked reading to efficiently parse metadata from remote GGUF files, eliminating the need to download entire models. This approach significantly speeds up the initial assessment process. Written in Go, the tool benefits from the language's inherent performance and concurrency capabilities. It can also estimate the maximum tokens per second (TPS) by analyzing provided device metrics (CPU/GPU FLOPS and bandwidth), offering a predictive performance benchmark. Furthermore, it categorizes GGUF files by their intended use, such as embedding, reranking, or general model inference.

Quick Start & Requirements

  • Installation: Install from releases.
  • Prerequisites: None explicitly mentioned beyond the Go tool itself.
  • Links: No official quick-start or demo links are provided in the README.

Highlighted Details

  • Remote File Parsing: Enables analysis of GGUF files directly from URLs (Hugging Face, ModelScope, Ollama) without downloading, saving bandwidth and time.
  • Accurate Resource Estimation: Provides memory usage predictions with an approximate deviation of only 100MiB from actual requirements, aiding in precise resource allocation.
  • Performance Prediction: Estimates maximum tokens per second (TPS) by integrating device hardware metrics, allowing for rapid model selection based on performance benchmarks.
  • Model Type Identification: Screens GGUF files to identify their purpose (e.g., embedding, reranking, LoRA, audio projectors, diffusion models), offering clarity on model roles.
  • High Performance: Implemented in Go, ensuring fast and efficient parsing and estimation.
  • Multi-GPU & Multi-Host Support: Capable of estimating memory distribution and requirements across multiple GPUs on a single host or across distributed systems.

Maintenance & Community

The provided README does not contain specific details regarding notable contributors, sponsorships, or community channels (e.g., Discord, Slack).

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license is highly permissive, allowing for commercial use, modification, and distribution, making it suitable for integration into proprietary or open-source projects without significant restrictions.

Limitations & Caveats

  • Prediction Accuracy: Memory usage estimations may deviate by approximately 100MiB from actual requirements.
  • Model Naming Conventions: Parsing accuracy can be affected by GGUF files whose naming does not strictly adhere to Hugging Face's general.file_type conventions.
  • Feature Support: Certain model types or features, such as Reranking, are explicitly marked as "Unsupported" in the tool's output examples.
Health Check
Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Coauthor of SGLang) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm-analysis by cli99

0%
485
CLI tool for LLM latency/memory analysis during training/inference
Created 2 years ago
Updated 11 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

xTuring by stochasticai

0.0%
3k
SDK for fine-tuning and customizing open-source LLMs
Created 3 years ago
Updated 1 month ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

gemma_pytorch by google

0.4%
6k
PyTorch implementation for Google's Gemma models
Created 2 years ago
Updated 10 months ago
Feedback? Help us improve.