Fix c_pointer mem leak by Difers · Pull Request #3352 · NVIDIA/cutlass

Difers · 2026-06-26T06:31:50Z

Problem

c_pointers() on CUTLASS DSL scalar types (Int32, Float32, Float64, Float16, BFloat16, TFloat32, etc.) creates fresh ctypes objects on every
kernel invocation. In ML training loops with frequent kernel launches, these short-lived allocations interleave with long-lived framework objects,
causing memory fragmentation and monotonic process RSS growth that eventually leads to OOM.

Observed in production: ~0.3 GB/step RSS growth in large model training, leading to OOM within hundreds of steps. MoE architectures (many expert
kernels per step) are particularly affected.

Root Cause

Each call to c_pointers() creates 3 ctypes objects (c_int/c_float, pointer(), cast()) that are immediately discarded after use. In a training
loop with multiple kernel calls per step, this produces hundreds to thousands of short-lived objects per step that fragment the heap. tracemalloc
cannot detect this because ctypes objects are freed before snapshots — the damage is heap fragmentation.

Fix

Add a per-type _cptr_cache dictionary that caches c_pointers() results keyed by scalar value. On subsequent calls with the same value, return
the cached result directly.

Related issue #3351

Fix mem c_pointer leak

75e14d5

Difers mentioned this pull request Jun 26, 2026

[BUG] __c_pointers__() causes monotonic RSS growth due to uncached ctypes allocations in kernel launch hot path #3351

Open

Difers changed the title ~~Fix mem c_pointer leak~~ Fix c_pointer mem c_pointer leak Jun 29, 2026

Difers changed the title ~~Fix c_pointer mem c_pointer leak~~ Fix c_pointer mem leak Jun 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix c_pointer mem leak#3352

Fix c_pointer mem leak#3352
Difers wants to merge 1 commit into
NVIDIA:mainfrom
Difers:fix_mem_leak

Difers commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Difers commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Difers commented Jun 26, 2026 •

edited

Loading