t1  by chipsalliance

RISC-V Vector implementation inspired by the Cray X1 vector machine

Created 3 years ago
288 stars

Top 91.2% on SourcePulse

GitHubView on GitHub
Project Summary

T1 (Torrent-1) is a RISC-V vector processor implementation inspired by the Cray X1, targeting researchers and hardware designers. It offers a lane-based microarchitecture with extensive chaining and configurable SRAM-based Vector Register Files (VRFs), supporting standard RISC-V vector extensions and large VLEN/DLEN configurations up to 64K.

How It Works

T1 implements a lane-based microarchitecture with a focus on intensive chaining between Vector Function Units (VFUs) and Load Store Units (LSUs). It features configurable banked SRAM VRFs with various port configurations and pipelined/asynchronous VFUs. The LSU supports instruction-level out-of-order execution and configurable outstanding memory instructions to mitigate latency. The design prioritizes balancing throughput, area, and frequency, allowing users to tune performance by adjusting VRF memory types, pipeline stages, and LSU configurations.

Quick Start & Requirements

  • Installation: Nix is the primary build system. Docker images are available via docker pull ghcr.io/chipsalliance/t1-<config>:latest.
  • Prerequisites: Nix package manager, potentially QEMU/KVM for Docker image builds.
  • Resources: Building and emulation can be resource-intensive.
  • Documentation: Configuration options and build commands are detailed in the README.

Highlighted Details

  • Supports standard RISC-V vector extensions (Zve32f, Zve32x) and configurable VLEN/DLEN up to 64K.
  • Features lane-based execution with support for masked element skipping and direct-connected lane interconnections.
  • LSU supports instruction-level out-of-order execution and configurable outstanding memory instructions.
  • Design Space Exploration (DSE) principles allow tuning for efficiency or performance by adjusting VRF memory, VFU pipeline stages, and LSU configurations.

Maintenance & Community

  • The project is maintained by the CHIPS Alliance.
  • Development is driven by Nix Flakes. Test cases cover various categories including assembly, MLIR, and PyTorch.

Licensing & Compatibility

  • License: Apache-2.0 License.
  • Compatibility: Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

  • The forked Rocket Core is not officially supported and can be replaced.
  • The LSU has specific requirements for bus ordering and no-MMU support for high-bandwidth ports, which may not be compatible with all RISC-V scalar cores.
  • No coherence support is provided for high-performance caches.
Health Check
Last Commit

23 hours ago

Responsiveness

1 week

Pull Requests (30d)
10
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.4%
4k
High-performance C++ LLM inference library
Created 2 years ago
Updated 1 week ago
Starred by Taranjeet Singh Taranjeet Singh(Cofounder of Mem0), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
4 more.

LMCache by LMCache

3.5%
5k
LLM serving engine extension for reduced TTFT and increased throughput
Created 1 year ago
Updated 20 hours ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
36 more.

unsloth by unslothai

0.6%
46k
Finetuning tool for LLMs, targeting speed and memory efficiency
Created 1 year ago
Updated 19 hours ago
Feedback? Help us improve.