atomic_queue  by max0x7ba

C++14 lock-free queue minimizing latency between push/pop operations

Created 7 years ago
1,854 stars

Top 22.9% on SourcePulse

GitHubView on GitHub
Project Summary

This C++14 library provides highly optimized, lock-free, multiple-producer-multiple-consumer (MPMC) queues based on circular buffers. It targets developers building ultra-low-latency systems, aiming to minimize the time between pushing and popping elements by employing a minimalist design with minimal atomic operations and explicit cache-friendliness.

How It Works

The queues utilize fixed-size circular buffers and std::atomic operations. Key design principles include minimizing atomic instructions, avoiding false sharing, using a linear buffer array, and avoiding heap allocations post-construction. This minimalist approach, including value semantics (copy/move on push/pop), aims for synergy to maximize CPU performance by reducing cache misses and pipeline stalls. Power-of-2 buffer sizes enable efficient index mapping and cache line contention reduction.

Quick Start & Requirements

  • Install: Clone the repository and add atomic_queue/include to your build system's include paths. Alternatively, use vcpkg install atomic-queue or conan install atomic-queue.
  • Prerequisites: C++14 compliant compiler. Benchmarks require Intel TBB.
  • Links: GitHub, Benchmarks

Highlighted Details

  • Offers AtomicQueue (for atomic types) and AtomicQueue2 (for non-atomic types), with Optimist variants for busy-waiting.
  • Supports single-producer-single-consumer (SPSC) modes for further optimization.
  • Provides optional totally ordered mode with zero cost on Intel x86.
  • Benchmarked against std::mutex, boost::lockfree, moodycamel::ConcurrentQueue, and others, claiming superior latency and throughput.

Maintenance & Community

  • The project is maintained by Maxim Egorushkin.
  • Contributions are welcome.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Generally compatible with C++14 platforms supporting std::atomic. Reported compatible with Windows, but CI is Linux-only.

Limitations & Caveats

Queue size must be fixed at compile or construction time. The library relies on specific OS behaviors and thread scheduling (e.g., SCHED_FIFO) for optimal performance, and achieving the lowest latency requires careful system configuration. Debug builds include asserts for the NIL value, which must not be pushed.

Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Johannes Hagemann Johannes Hagemann(Cofounder of Prime Intellect), and
4 more.

S-LoRA by S-LoRA

0%
2k
System for scalable LoRA adapter serving
Created 2 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pankaj Gupta Pankaj Gupta(Cofounder of Baseten), and
1 more.

cccl by NVIDIA

0.3%
2k
CUDA C++ building blocks for high-performance GPU computing
Created 5 years ago
Updated 11 hours ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.8%
5k
High-performance C++ LLM inference library
Created 3 years ago
Updated 11 hours ago
Feedback? Help us improve.