DALI by NVIDIA

GPU-accelerated library for data pre-processing in deep learning

Created 7 years ago

5,594 stars

Top 8.9% on SourcePulse

9 Experts Love This Project

hammer

Jeff Hammerbacher

Cofounder of Cloudera

agermanidis

Anastasis Germanidis

Cofounder of Runway

apsdehal

Amanpreet Singh

Cofounder of Contextual AI

cvalenzuela

Cristóbal Valenzuela

Cofounder of Runway

and 5 more!

Project Summary

NVIDIA DALI is a GPU-accelerated library designed to eliminate data loading and pre-processing bottlenecks in deep learning workflows. It offers a collection of optimized building blocks for image, video, and audio data, enabling users to create portable, high-throughput data pipelines that can be seamlessly integrated with popular frameworks like TensorFlow, PyTorch, and PaddlePaddle.

How It Works

DALI addresses CPU-bound data processing by offloading operations to the GPU. It utilizes a custom execution engine optimized for throughput, featuring prefetching, parallel execution, and batch processing. This GPU-centric approach, combined with a flexible, functional Python API, allows for the creation of complex, multi-stage data augmentation and transformation pipelines that run efficiently, directly feeding data to the GPU for training or inference.

Quick Start & Requirements

Install with pip install nvidia-dali-cuda120 or pip install --extra-index-url https://pypi.nvidia.com --upgrade nvidia-dali-cuda120.
Requires NVIDIA driver supporting the target CUDA version (e.g., CUDA 12.x) and the corresponding CUDA Toolkit.
Pre-installed in NGC containers for TensorFlow, PyTorch, and PaddlePaddle.
Official Documentation: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html

Highlighted Details

Supports numerous data formats including LMDB, TFRecord, COCO, JPEG, WAV, FLAC, and video codecs (H.264, VP9, HEVC).
Accelerates common workloads like ResNet-50, SSD, and ASR models (Jasper, RNN-T).
Enables direct data transfer via GPUDirect Storage and integrates with NVIDIA Triton Inference Server.
Offers custom operator extensibility for user-specific needs.

Maintenance & Community

Active development with a public roadmap available at https://github.com/NVIDIA/DALI/issues/5320.
Contributions are welcomed following guidelines in CONTRIBUTING.md.

Licensing & Compatibility

Licensed under Apache 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Requires NVIDIA GPU hardware and specific CUDA toolkit versions for installation and operation. Building from source requires a compilation guide.

Health Check

Last Commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)

27

Issues (30d)

3

Star History

29 stars in the last 30 days

Explore Similar Projects

tiny-dream by symisc

Header-only C++ library for Stable Diffusion inference

Created 2 years ago

Updated 2 years ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

1 more.

varuna by microsoft

Tool for efficient large DNN model training on commodity hardware

Created 4 years ago

Updated 1 year ago

Starred by

Tri Dao

Tri Dao(Chief Scientist at Together AI),

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and

1 more.

oslo by tunib-ai

Framework for large-scale transformer optimization

Created 4 years ago

Updated 3 years ago

MPP-LLaVA by Coobiw

MLLM for training LLaVA-like models on limited hardware

Created 2 years ago

Updated 10 months ago

binary-mlc-llm-libs by mlc-ai

Pre-compiled LLM libraries for efficient deployment

Created 2 years ago

Updated 1 month ago

Omega-AI by dromara

Java DL framework for model training/inference, supporting multi-GPU

Created 6 years ago

Updated 3 months ago

Starred by

Bryan Helmig

Bryan Helmig(Cofounder of Zapier) and

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

Tutel by microsoft

Optimized MoE library for modern training and inference

Created 4 years ago

Updated 3 weeks ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect) and

Alexander Borzunov

Alexander Borzunov(Research Scientist at OpenAI).

efficient-dl-systems by mryab

Course materials for efficient deep learning systems

Created 4 years ago

Updated 1 month ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

3 more.

LitServe by Lightning-AI

AI inference pipeline framework

Created 2 years ago

Updated 4 days ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Ying Sheng

Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

High-performance C++ LLM inference library

Created 2 years ago

Updated 1 month ago

Starred by

Lianmin Zheng

Lianmin Zheng(Coauthor of SGLang, vLLM),

Simon Willison

Simon Willison(Coauthor of Django), and

9 more.

CTranslate2 by OpenNMT

Fast inference engine for Transformer models

Created 6 years ago

Updated 14 hours ago

Starred by

Clement Delangue

Clement Delangue(Cofounder of Hugging Face),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

26 more.

datasets by huggingface

Access and process large AI datasets efficiently

Created 5 years ago

Updated 2 days ago

Feedback? Help us improve.