Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Silas-Asamoah/stormlog/llms.txt
Use this file to discover all available pages before exploring further.
The GPU Memory Profiler provides simple APIs to profile memory usage in your deep learning workflows. This guide shows you how to get started with basic profiling in both PyTorch and TensorFlow.
PyTorch profiling
Basic setup
Import the profiler and create an instance:
from gpumemprof import GPUMemoryProfiler
profiler = GPUMemoryProfiler(track_tensors=True)
Profile function calls
Use profile_function() to measure memory usage of any callable:
import torch
def allocate_tensor(size_mb, device):
elements = int(size_mb * 1024 * 1024 / 4)
rows = max(1, elements // 1024)
tensor = torch.randn(rows, 1024, device=device)
return tensor.mean().item()
device = torch.device("cuda")
# Profile multiple allocations
for idx in range(3):
size_mb = 32 * (idx + 1)
def allocate(sz=size_mb, dev=device):
return allocate_tensor(sz, dev)
allocate.__name__ = f"tensor_alloc_{size_mb}mb"
profiler.profile_function(allocate)
Profile training loops
Wrap training epochs with profile_context() to track memory during training:
import torch.nn as nn
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10)
).cuda()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()
for epoch in range(2):
with profiler.profile_context(f"epoch_{epoch+1}"):
# Your training step here
inputs = torch.randn(32, 784, device="cuda")
targets = torch.randint(0, 10, (32,), device="cuda")
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
See pytorch_demo.py:53-62
Get profiling results
Retrieve a summary of all profiled operations:
summary = profiler.get_summary()
print(f"Total operations profiled: {len(summary['results'])}")
print(f"Peak memory: {summary['peak_memory_mb']:.2f} MB")
print(f"Average memory: {summary['average_memory_mb']:.2f} MB")
TensorFlow profiling
Basic setup
Import the TensorFlow-specific profiler:
import tensorflow as tf
from tfmemprof import TFMemoryProfiler
profiler = TFMemoryProfiler(enable_tensor_tracking=True)
Profile with decorator
Use the @profile_function decorator:
@profiler.profile_function
def allocate_batch():
inputs = tf.random.normal((128, 784))
targets = tf.random.uniform((128,), maxval=10, dtype=tf.int32)
return float(inputs.numpy().mean())
allocate_batch()
See tensorflow_demo.py:31-38
Profile training steps
Wrap training iterations with context managers:
model = tf.keras.Sequential([
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(10)
])
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
for epoch in range(2):
with profiler.profile_context(f"tf_epoch_{epoch+1}"):
inputs = tf.random.normal((32, 784))
targets = tf.random.uniform((32,), maxval=10, dtype=tf.int32)
with tf.GradientTape() as tape:
predictions = model(inputs, training=True)
loss = loss_fn(targets, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
See tensorflow_demo.py:41-45
Get profiling results
Retrieve profiling results:
results = profiler.get_results()
print(f"Duration: {results.duration:.3f}s")
print(f"Peak memory: {results.peak_memory_mb:.2f} MB")
print(f"Average memory: {results.average_memory_mb:.2f} MB")
print(f"Snapshots captured: {len(results.snapshots)}")
See tensorflow_demo.py:57-62
Next steps