PyTorch Profiler Integration

This guide explains how to use pydftracer with PyTorch’s built-in profiler to capture profiling events and log them to DFTracer trace files.

Overview

The PyTorch Profiler integration allows you to:

  • Capture PyTorch profiler events (CPU, CUDA, memory usage)

  • Log profiling data to DFTracer trace files with category PP (PyTorch Profiler)

  • Combine PyTorch profiling with I/O tracing for comprehensive analysis

Setup

Enable DFTracer before using the PyTorch Profiler integration:

export DFTRACER_ENABLE=1

Basic Usage

Use the trace_handler function as the on_trace_ready callback for PyTorch’s profiler:

import torch
from torch.profiler import profile, schedule, ProfilerActivity
from dftracer.python import dftracer
from dftracer.python.torch import trace_handler

# Initialize DFTracer logger
df_logger = dftracer.initialize_log("profiler_trace.pfw", None, -1)

# Define profiler schedule
profiler_schedule = schedule(
    wait=1,
    warmup=1,
    active=2,
    repeat=1,
)

# Run profiler with trace_handler
with profile(
    activities=[ProfilerActivity.CPU],
    schedule=profiler_schedule,
    on_trace_ready=trace_handler,
    profile_memory=True,
    with_stack=True,
) as p:
    for step in range(4):
        # Your training code here
        model(input_data)
        p.step()

df_logger.finalize()

Training Loop Example

Complete example with a training loop:

import torch
import torch.nn as nn
from torch.profiler import profile, schedule, ProfilerActivity, record_function
from dftracer.python import dftracer
from dftracer.python import dft_fn as Profile
from dftracer.python.torch import trace_handler

# Initialize logger
df_logger = dftracer.initialize_log("training.pfw", None, -1)

# Model setup
model = nn.Linear(10, 2)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

# Create DFTracer profile for additional tracing
df_test = Profile("training")

@df_test.log
def training_step(inputs, labels):
    optimizer.zero_grad()

    with record_function("forward"):
        outputs = model(inputs)

    with record_function("loss"):
        loss = criterion(outputs, labels)

    loss.backward()
    optimizer.step()
    return loss.item()

# Profiler schedule
profiler_schedule = schedule(wait=1, warmup=1, active=2, repeat=1)

with profile(
    activities=[ProfilerActivity.CPU],
    schedule=profiler_schedule,
    on_trace_ready=trace_handler,
    profile_memory=True,
) as p:
    for step in range(4):
        inputs = torch.randn(8, 10)
        labels = torch.randint(0, 2, (8,))
        loss = training_step(inputs, labels)
        p.step()

df_logger.finalize()

What Gets Traced

The trace_handler captures the following from PyTorch profiler events:

  • Event name: Operation or kernel name

  • Timing: Start time and duration (microseconds)

  • Device type: CPU or CUDA device

  • Memory usage: CPU and device memory

  • Input shapes: Size of input tensors

  • CPU/Device utilization: Percentage metrics

All events are logged with category PP (PyTorch Profiler) in the trace file.

Example Output

Trace entries from PyTorch Profiler look like:

{"name":"aten::linear","cat":"PP","ts":1234567890,"dur":150,"args":{"device":0,"cpu_memory":1024,"device_memory_usage":0}}

Combining with I/O Tracing

To capture both PyTorch profiler events and I/O operations:

export DFTRACER_ENABLE=1
export DFTRACER_DATA_DIR=all
from dftracer.python import dftracer, ai
from dftracer.python.torch import trace_handler
from torch.profiler import profile, ProfilerActivity

df_logger = dftracer.initialize_log("combined.pfw", None, -1)

# Use AI decorators for I/O tracing
@ai.data.item
def load_data(idx):
    # Your data loading code
    return data

# Use PyTorch profiler for compute tracing
with profile(
    activities=[ProfilerActivity.CPU],
    on_trace_ready=trace_handler,
) as p:
    for step in range(steps):
        data = load_data(step)
        output = model(data)
        p.step()

df_logger.finalize()

API Reference

dftracer.python.torch.trace_handler(profiler_result)

Callback function for PyTorch profiler’s on_trace_ready parameter.

Parameters:

profiler_result (torch.profiler.profiler.ProfilerResult) – PyTorch profiler result object containing events

The handler iterates through all profiler events and logs them to DFTracer with the following information:

  • name: Event key/name

  • cat: Always "PP" (PyTorch Profiler)

  • start_time: Event start time in microseconds

  • duration: Event duration in microseconds

  • int_args: device, cpu_memory, is_remote, device_memory_usage, input_size

  • float_args: total_cpu_percent, total_device_percent