gguf-parser-go by gpustack

Analyze GGUF models and estimate inference resources

Created 2 years ago

272 stars

Top 94.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Ettore Di Giacinto

Author of LocalAI

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This project provides a Go-based utility for inspecting GGUF model files without requiring full downloads. It offers precise estimations of memory usage and maximum tokens per second (TPS), enabling users to quickly evaluate and plan for model deployment. The tool is designed for ML engineers and researchers working with large language models in the GGUF format.

How It Works

The gguf-parser-go leverages chunked reading to efficiently parse metadata from remote GGUF files, eliminating the need to download entire models. This approach significantly speeds up the initial assessment process. Written in Go, the tool benefits from the language's inherent performance and concurrency capabilities. It can also estimate the maximum tokens per second (TPS) by analyzing provided device metrics (CPU/GPU FLOPS and bandwidth), offering a predictive performance benchmark. Furthermore, it categorizes GGUF files by their intended use, such as embedding, reranking, or general model inference.

Quick Start & Requirements

Installation: Install from releases.
Prerequisites: None explicitly mentioned beyond the Go tool itself.
Links: No official quick-start or demo links are provided in the README.

Highlighted Details

Remote File Parsing: Enables analysis of GGUF files directly from URLs (Hugging Face, ModelScope, Ollama) without downloading, saving bandwidth and time.
Accurate Resource Estimation: Provides memory usage predictions with an approximate deviation of only 100MiB from actual requirements, aiding in precise resource allocation.
Performance Prediction: Estimates maximum tokens per second (TPS) by integrating device hardware metrics, allowing for rapid model selection based on performance benchmarks.
Model Type Identification: Screens GGUF files to identify their purpose (e.g., embedding, reranking, LoRA, audio projectors, diffusion models), offering clarity on model roles.
High Performance: Implemented in Go, ensuring fast and efficient parsing and estimation.
Multi-GPU & Multi-Host Support: Capable of estimating memory distribution and requirements across multiple GPUs on a single host or across distributed systems.

Maintenance & Community

The provided README does not contain specific details regarding notable contributors, sponsorships, or community channels (e.g., Discord, Slack).

Licensing & Compatibility

License: MIT License.
Compatibility: The MIT license is highly permissive, allowing for commercial use, modification, and distribution, making it suitable for integration into proprietary or open-source projects without significant restrictions.

Limitations & Caveats

Prediction Accuracy: Memory usage estimations may deviate by approximately 100MiB from actual requirements.
Model Naming Conventions: Parsing accuracy can be affected by GGUF files whose naming does not strictly adhere to Hugging Face's general.file_type conventions.
Feature Support: Certain model types or features, such as Reranking, are explicitly marked as "Unsupported" in the tool's output examples.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days