HuggingFaceModelDownloader  by bodaay

CLI tool for downloading Hugging Face models/datasets

Created 2 years ago
741 stars

Top 46.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This Go utility provides a fast, multithreaded alternative to git lfs for downloading models and datasets from HuggingFace. It targets developers and researchers needing efficient access to large AI models, offering features like SHA256 checksum verification, download resumption, and flexible filtering for specific model files, particularly useful for quantized formats like GGML.

How It Works

The tool leverages Go's concurrency primitives to download large LFS files using multiple connections simultaneously, significantly speeding up transfers compared to single-threaded methods. It performs SHA256 checksum verification post-download to ensure data integrity and supports resuming interrupted downloads by skipping already downloaded files. Users can filter LFS files based on patterns, which is particularly beneficial for downloading specific quantized variants of models (e.g., GGML q4_0, q5_0).

Quick Start & Requirements

  • Install: bash <(curl -sSL https://g.bodaay.io/hfd) (installs to current directory) or bash <(curl -sSL https://g.bodaay.io/hfd) -i (installs to OS bin folder).
  • Prerequisites: None explicitly mentioned beyond a compatible OS (Linux, Mac, Windows WSL2).
  • Usage: hfdownloader -m <model_name> or hfdownloader -d <dataset_name>.
  • Docs: https://g.bodaay.io/hfd

Highlighted Details

  • Multithreaded LFS downloads with configurable concurrent connections (default 5).
  • SHA256 checksum verification for downloaded files.
  • Filter LFS files by name, useful for downloading specific model quantizations (e.g., GGML variants).
  • Supports resuming interrupted downloads and skipping existing files.
  • Can use HuggingFace Access Tokens via environment variable (HF_TOKEN), .env file, or command-line flag.
  • Configuration file support (~/.config/hfdownloader.json) for default settings.

Maintenance & Community

The project is maintained by bodaay. No specific community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. This requires clarification for commercial use or integration into closed-source projects.

Limitations & Caveats

The project's license is not specified, which may pose a risk for commercial adoption. The README focuses on Linux/Mac/WSL2; Windows native support is not detailed.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
27 stars in the last 30 days

Explore Similar Projects

Starred by Ross Wightman Ross Wightman(Author of timm; CV at Hugging Face), Awni Hannun Awni Hannun(Author of MLX; Research Scientist at Apple), and
1 more.

mlx-llm by riccardomusmeci

0%
454
LLM tools/apps for Apple Silicon using MLX
Created 1 year ago
Updated 7 months ago
Starred by Chris Van Pelt Chris Van Pelt(Cofounder of Weights & Biases), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
4 more.

tensorizer by coreweave

0.4%
265
Module for fast model serialization/deserialization
Created 2 years ago
Updated 4 weeks ago
Feedback? Help us improve.