hopper

Here is 1 public repository matching this topic...

waynehacking8 / blackwell-tensorcore-kernels

Hand-written CUDA Tensor Core GEMM kernels on Blackwell (sm_120) and Hopper (sm_90) — raw mma.sync reaching 106% of the cuBLAS-TC kernel on sm_120, CUTLASS 3.x wgmma at 85.5% of nvjet on H100, and an FP16→FP8→MXFP4 precision ladder. Every number reproducible from committed bench data.

performance-engineering gpu cuda cublas hopper cutlass gemm tensor-cores blackwell

Updated Jun 4, 2026
Cuda

Improve this page

Add a description, image, and links to the hopper topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the hopper topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hopper

Here is 1 public repository matching this topic...

waynehacking8 / blackwell-tensorcore-kernels

Improve this page

Add this topic to your repo