#

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

Here are 1,874 public repositories matching this topic...

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Updated Jun 29, 2026
Python

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

reinforcement-learning cuda inference transformer moe attention llama glm minimax wan diffusion vlm blackwell llm qwen deepseek gpt-oss qwen-image

Updated Jun 29, 2026
Python

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

cuda pytorch moe blackwell llm-serving

Updated Jun 29, 2026
Python

cupy / cupy

NumPy & SciPy for GPU

python gpu numpy cuda cublas scipy tensor cudnn rocm cupy cusolver nccl curand cusparse nvrtc cutensor nvtx cusparselt

Updated Jun 29, 2026
Python

numba

numba / numba

NumPy aware dynamic Python compiler using LLVM

python compiler numpy llvm parallel cuda numba

Updated Jun 26, 2026
Python

LMCache / LMCache

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

fast amd cuda inference pytorch speed rocm kv-cache llm vllm

Updated Jun 29, 2026
Python

nvitop

XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

console monitoring gpu grafana cuda prometheus nvidia prometheus-exporter curses nvml top command-line-tool htop grafana-dashboard nvidia-smi monitoring-tool process-monitoring gpu-monitoring resource-monitor

Updated Jun 22, 2026
Python

NVIDIA / warp

A Python framework for GPU-accelerated simulation, robotics, and machine learning.

python gpu cuda nvidia gpu-acceleration differentiable-programming nvidia-warp

Updated Jun 29, 2026
Python

chainer / chainer

A flexible framework of neural networks for deep learning

python machine-learning deep-learning neural-network chainer gpu numpy cuda neural-networks cudnn cupy

Updated Aug 28, 2023
Python

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

gpu cuda jit pytorch nvidia moe attention llm-inference large-large-models distributed-inference

Updated Jun 29, 2026
Python

gpustack / gpustack

A GPU cluster manager for high-performance AI model serving (vLLM, SGLang) and on-demand SSH-accessible GPU instances.

cuda inference openai llama maas rocm ascend llm llm-serving vllm genai llm-inference qwen deepseek sglang distributed-inference high-performance-inference mindie

Updated Jun 29, 2026
Python

rapidsai / cuml

cuML - RAPIDS Machine Learning Library

machine-learning gpu machine-learning-algorithms cuda nvidia

Updated Jun 29, 2026
Python

NVIDIAGameWorks / kaolin

A PyTorch Library for Accelerating 3D Deep Learning Research

cuda pytorch artificial-intelligence neural-networks camera-api physics-simulation rasterization interactive-visualizations 3d-deep-learning differentiable-rendering differentiable-lighting gaussian-splatting nvidia-warp

Updated Jun 18, 2026
Python

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

python machine-learning deep-learning gpu cuda pytorch jax fp8 fp4

Updated Jun 29, 2026
Python

viseron

roflcoopter / viseron

Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.

Updated Jun 25, 2026
Python

Jittor / jittor

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

python deep-learning gpu cuda jittor

Updated Jun 29, 2026
Python

pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

machine-learning deep-learning cuda pytorch nvidia jetson tensorrt libtorch

Updated Jun 29, 2026
Python

NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors

Updated Mar 5, 2024
Python

containers / ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

ai containers cuda intel hip hacktoberfest inference-server podman llm llamacpp vllm

Updated Jun 29, 2026
Python

pytorch / ao

PyTorch native quantization and sparsity for training and inference

training sparsity cuda inference pytorch transformer llama quantization mx brrr dtypes float8

Updated Jun 29, 2026
Python

Created by Nvidia

Released June 23, 2007

Followers: 314 followers
Website: github.com/topics/cuda
Wikipedia: Wikipedia

Related topics

nvcc