A General-purpose Parallel and Heterogeneous Task Programming System
Sample codes for my CUDA programming book
Thin, unified, C++-flavored wrappers for the CUDA APIs
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Mo...
A simple GPU hash table implemented in CUDA using lock free techniques
This is an archive of materials produced for an introductory class on CU...
An implementation of HIP that works on CPUs, across OSes.
CUDA kernel author's tools
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-...
CUDA Guide