Command-Line Tools¶
DFTracer Utils provides several command-line utilities for working with DFTracer trace files and compressed archives.
dftracer_reader¶
Description: DFTracer utility for reading and indexing compressed files (GZIP, TAR.GZ)
Usage:
dftracer_reader [OPTIONS] file
Arguments:
file- Compressed file to process (GZIP, TAR.GZ) [required]
Options:
-i, --index <path>- Index file to use (default: auto-generated in temp directory)-s, --start <bytes>- Start position in bytes (default: -1)-e, --end <bytes>- End position in bytes (default: -1)-c, --checkpoint-size <bytes>- Checkpoint size for indexing in bytes (default: 33554432 B / 32 MB)-f, --force-rebuild- Force rebuild of index even if it exists--check- Check if index is valid--read-buffer-size <bytes>- Size of the read buffer in bytes (default: 1MB)--mode <mode>- Set the reading mode: bytes, line_bytes, or lines (default: bytes)--index-dir <path>- Directory to store index files (default: system temp directory)
Example:
# Read bytes 100-200 from a compressed file
dftracer_reader --start 100 --end 200 trace.pfw.gz
# Read in line mode
dftracer_reader --mode lines --start 1 --end 100 trace.pfw.gz
# Build index with custom checkpoint size
dftracer_reader --checkpoint-size 20971520 trace.pfw.gz
dftracer_info¶
Description: Display metadata and index information for DFTracer compressed files
Usage:
dftracer_info [OPTIONS]
Options:
--files <files...>- Compressed files to inspect (GZIP, TAR.GZ)-d, --directory <path>- Directory containing files to inspect--query <type>- Query type:summary(aggregate all files, default) ordetailed(per-file output)-v, --verbose- Show detailed information including index details-f, --force-rebuild- Force rebuild index files-c, --checkpoint-size <bytes>- Checkpoint size for indexing in bytes (default: 33554432 B / 32 MB)--index-dir <path>- Directory to store index files (default: system temp directory)--executor-threads <count>- Number of worker threads for parallel processing (default: number of CPU cores)
Example:
# Show info for files in a directory
dftracer_info -d ./logs
# Show info for specific files with verbose output
dftracer_info --files trace1.pfw.gz trace2.pfw.gz -v
# Per-file detailed output
dftracer_info -d ./traces --query detailed
# Analyze with 4 threads
dftracer_info --executor-threads 4 -d ./traces
dftracer_merge¶
Description: Merge DFTracer .pfw or .pfw.gz files into a single JSON array file using pipeline processing
Usage:
dftracer_merge [OPTIONS]
Options:
-d, --directory <path>- Directory containing .pfw or .pfw.gz files (default: .)-o, --output <path>- Output file path (should have .pfw extension) (default: combined.pfw)-f, --force- Override existing output file and force index recreation-c, --compress- Compress output file with gzip-v, --verbose- Enable verbose mode-g, --gzip-only- Process only .pfw.gz files--checkpoint-size <bytes>- Checkpoint size for indexing in bytes (default: 33554432 B / 32 MB)--executor-threads <count>- Number of worker threads for parallel processing (default: number of CPU cores)--index-dir <path>- Directory to store index files (default: system temp directory)
Example:
# Merge all .pfw/.pfw.gz files in current directory
dftracer_merge -o merged.pfw
# Merge files from specific directory with compression
dftracer_merge -d ./logs -o output.pfw -c
# Merge with parallel processing and verbose output
dftracer_merge -d ./traces -o combined.pfw --executor-threads 8 -v
dftracer_split¶
Description: Split DFTracer traces into equal-sized chunks using pipeline processing
Usage:
dftracer_split [OPTIONS]
Options:
-n, --app-name <name>- Application name for output files (default: app)-d, --directory <path>- Input directory containing .pfw or .pfw.gz files (default: .)-o, --output <dir>- Output directory for split files (default: ./split)-s, --chunk-size <MB>- Chunk size in MB (default: 4)-f, --force- Override existing files and force index recreation-c, --compress- Compress output files with gzip (default: true)-v, --verbose- Enable verbose mode--checkpoint-size <bytes>- Checkpoint size for indexing in bytes (default: 33554432 B / 32 MB)--executor-threads <count>- Number of worker threads for parallel processing (default: number of CPU cores)--index-dir <path>- Directory to store index files (default: system temp directory)--verify- Verify output chunks match input by comparing event IDs
Example:
# Split files into 4MB chunks
dftracer_split -d ./logs -o ./split_output
# Split with 10MB chunks and custom app name
dftracer_split -d ./traces -s 10 -n myapp -o ./chunks
# Split without compression and verify output
dftracer_split -d ./data -c false --verify -o ./output
dftracer_event_count¶
Description: Count valid events in DFTracer .pfw or .pfw.gz files using pipeline processing
Usage:
dftracer_event_count [OPTIONS]
Options:
-d, --directory <path>- Directory containing .pfw or .pfw.gz files (default: .)-f, --force- Force index recreation-c, --checkpoint-size <bytes>- Checkpoint size for indexing in bytes (default: 33554432 B / 32 MB)--executor-threads <count>- Number of worker threads for parallel processing (default: number of CPU cores)--index-dir <path>- Directory to store index files (default: system temp directory)
Example:
# Count events in current directory
dftracer_event_count
# Count events in specific directory with 8 threads
dftracer_event_count -d ./traces --executor-threads 8
# Force index rebuild
dftracer_event_count -d ./logs -f
dftracer_pgzip¶
Description: Parallel gzip compression for DFTracer .pfw files
Usage:
dftracer_pgzip [OPTIONS]
Options:
-d, --directory <path>- Directory containing .pfw files (default: .)-v, --verbose- Enable verbose output--executor-threads <count>- Number of worker threads for parallel processing (default: number of CPU cores)
Example:
# Compress all .pfw files in current directory
dftracer_pgzip
# Compress files in specific directory with verbose output
dftracer_pgzip -d ./logs -v
# Compress with 16 threads
dftracer_pgzip -d ./traces --executor-threads 16
dftracer_server¶
Description: HTTP server for querying and streaming DFTracer trace data via REST API
Usage:
dftracer_server [OPTIONS] --directory <path>
Options:
-b, --bind <address>- Bind address (default: 0.0.0.0)-p, --port <number>- Listen port (default: 8080)-d, --directory <path>- Directory containing trace files [required]--index-dir <path>- Directory for bloom/checkpoint index files (default: same as –directory)--executor-threads <count>- Number of worker threads (default: number of CPU cores)
Example:
# Start server on default port 8080
dftracer_server -d ./traces
# Start server on custom port with specific bind address
dftracer_server -b 127.0.0.1 -p 9000 -d ./traces
# Start with custom index directory and thread count
dftracer_server -d ./traces --index-dir /var/cache/dftracer_indexes --executor-threads 8
dftracer_stats¶
Description: Compute event statistics with bloom filter acceleration and detailed distribution analysis
Usage:
dftracer_stats [OPTIONS]
Options:
-d, --directory <path>- Directory containing .pfw or .pfw.gz files (default: .)--files <files...>- Explicit list of trace files--index-dir <path>- Directory to store index files (default: system temp directory)--report <type>- Report type: summary, categories, names, pid_tids, time_range, duration, top-names, top-categories, detailed (default: summary)--top-n <count>- Top N entries to show in detailed report (0=all, default: 10)--top-n-pid-tid <count>- Top N PID:TID pairs to show (default: 10)--query <query>- Query DSL filter (e.g.,'cat == "POSIX" and dur > 1000')--group-by <dims...>- Group-by dimensions: name, cat, pid, tid, fhash, hhash, pid_tid (default: name for detailed)--json- Output in JSON format--no-auto-index- Disable automatic bloom index building--checkpoint-size <bytes>- Checkpoint size for indexing in bytes (default: 33554432 B / 32 MB)--executor-threads <count>- Number of worker threads (default: number of CPU cores)
Example:
# Summary statistics
dftracer_stats -d ./traces
# Top operations and categories
dftracer_stats -d ./traces --report categories
# Detailed duration distribution per operation
dftracer_stats -d ./traces --report detailed --group-by name --top-n 20
# Filter to POSIX operations only
dftracer_stats -d ./traces --report duration --query 'cat == "POSIX"'
dftracer_view¶
Description: Extract filtered subsets of trace data using query-based filtering with chunk pruning
Usage:
dftracer_view [OPTIONS]
Options:
--files <files...>- Trace files to process (.pfw, .pfw.gz)-d, --directory <path>- Directory containing trace files--preset <name>- Predefined view: io, compute, dlio--recipe <path>- Custom view JSON file path--save-recipe <path>- Save the constructed view to a JSON file--query <query>- Query DSL filter (e.g.,'cat == "POSIX" and dur > 1000')--time-range <min,max>- Timestamp filter in microseconds (e.g., 1000000,2000000)--min-duration <us>- Minimum event duration in microseconds--max-duration <us>- Maximum event duration in microseconds-o, --output <path>- Output file path (default: stdout)--stream- Stream matching events to stdout as NDJSON--no-metadata- Exclude metadata events (ph=M) from output--index-dir <path>- Directory where .idx index files are stored--no-auto-index- Disable automatic bloom index building for files missing .idx--checkpoint-size <bytes>- Checkpoint size for auto-indexing in bytes (default: 33554432 B / 32 MB)--executor-threads <count>- Number of worker threads (default: number of CPU cores)
Example:
# Extract I/O operations
dftracer_view --preset io -d ./traces -o io_events.pfw
# Custom query: POSIX read/write operations
dftracer_view -d ./traces --query 'cat == "POSIX" and name in ["read", "write"]' -o posix_rw.pfw
# Time-filtered view with output streaming
dftracer_view -d ./traces --time-range 1000000,5000000 --stream
dftracer_index¶
Description: Build per-chunk bloom filter indices for efficient chunk-skipping queries
Usage:
dftracer_index [OPTIONS]
Options:
-d, --directory <path>- Input directory containing .pfw or .pfw.gz files (default: .)--dimensions <dims>- Comma-separated extra dimensions to index from args (e.g., args.level,args.mode)-f, --force- Force index recreation even if already built--checkpoint-size <bytes>- Checkpoint size for gzip indexing in bytes (default: 33554432 B / 32 MB)--executor-threads <count>- Number of worker threads for parallel processing (default: number of CPU cores)--index-dir <path>- Directory to store index files (default: same as data files)--expected-entries <count>- Expected entries per chunk for bloom filter sizing (default: 1024)--false-positive-rate <rate>- Bloom filter false positive rate (default: 0.01)--read-batch-size <MB>- Batch read size in MB for stream processing (default: 4)--manifest- Also build manifest tables in .idx (per-checkpoint event line routing)--rebuild-summaries- RebuildROOT_*aggregated summaries after ingest. Off by default;ROOT_*CFs are only consumed by summary tools such asdftracer_info. Bloom-filter chunk-skipping queries do not require them.
This binary also accepts the shared Shared CLI Flags (Pipeline, Watchdog, Indexing).
Example:
# Build bloom indices for all traces
dftracer_index -d ./traces
# Build with custom dimensions and force rebuild
dftracer_index -d ./traces --dimensions "args.level,args.io.size" --force
# Build manifest indices for reorganization
dftracer_index -d ./traces --manifest
dftracer_aggregator¶
Description: Aggregate DFTracer events into time-series counters using streaming coroutine pipeline
The aggregator can emit three logical row types:
regular event rows from non-counter trace events
profile-counter rows from
ph="C"events whose category is notsyssystem-counter rows from
ph="C"events whose category issys
With --format arrow, these are distinguished by the batch_type column.
The Arrow output always includes the base columns batch_type, cat,
name, pid, tid, hhash, fhash, time_bucket, count,
dur_total, dur_min, dur_max, dur_mean, dur_std,
size_total, size_min, size_max, size_mean, size_std,
ts, and te. Each field listed in --metric-fields adds
<field>_total, <field>_min, <field>_max, <field>_mean, and
<field>_std.
Usage:
dftracer_aggregator [OPTIONS]
Options:
-d, --directory <path>- Input directory containing .pfw or .pfw.gz files (default: .)-o, --output <path>- Output file path for aggregated counters (default: aggregated_output.json)-t, --time-interval <ms>- Time interval in milliseconds for bucketing (default: 5000)-g, --group-keys <keys>- Comma-separated extra group keys from args (e.g., epoch,step,level)-m, --metric-fields <fields>- Comma-separated custom metric fields from args (e.g., iter_count,num_events)--query <query>- Query DSL filter (e.g.,'cat == "POSIX" and dur > 1000')-f, --force- Force index recreation--checkpoint-size <bytes>- Checkpoint size for indexing in bytes (default: 33554432 B / 32 MB)--executor-threads <count>- Number of executor threads for parallel processing (default: number of CPU cores)--index-dir <path>- Directory to store index files (default: system temp directory)--compress- Compress output using gzip--compression-level <0-9>- Gzip compression level (default: 6)--boundary-events <config>- Boundary event configuration: event_name:value_field:output_name--no-track-process-parents- Disable tracking of process parent relationships from fork/spawn--chunk-size <MB>- Target chunk size in MB for parallel processing (default: 4)--read-batch-size <MB>- Batch read size in MB for stream processing (default: 4)--event-format <fmt>- Perfetto event format: counter, async, regular (default: counter)--compute-percentiles- Enable percentile/quantile computation using DDSketch--percentiles <vals>- Comma-separated percentiles to compute (e.g., 0.25,0.5,0.75,0.90)--relative-accuracy <rate>- Relative accuracy for DDSketch percentile estimation (default: 0.01)--format <fmt>- Output format:json(default, Perfetto trace) orarrow(.arrowsIPC file). Arrow format requiresDFTRACER_UTILS_ENABLE_ARROW_IPC=ONat build time.
Example:
# Basic aggregation with 1-second (1000ms) buckets
dftracer_aggregator -d ./traces -o agg.json -t 1000
# Aggregation with percentiles and compression
dftracer_aggregator -d ./traces -o agg.json --compute-percentiles --compress
# Query-filtered aggregation with custom metrics from args
dftracer_aggregator -d ./traces --query 'cat == "POSIX"' \
-m "iter_count,epoch"
# Output as Arrow IPC file (readable by pyarrow, polars, DuckDB)
dftracer_aggregator -d ./traces -o agg.arrows --format arrow
# Stream profile/system counters as Perfetto counter events
dftracer_aggregator -d ./traces --event-format counter
Reading Arrow IPC output:
# pyarrow
import pyarrow.ipc as ipc
reader = ipc.open_file("agg.arrows")
table = reader.read_all()
df = table.to_pandas()
# polars
import polars as pl
df = pl.read_ipc("agg.arrows")
# DuckDB
import duckdb
result = duckdb.sql("SELECT * FROM 'agg.arrows'")
dftracer_gen_dlio_config¶
Description: Generate a DLIO YAML configuration directly from a directory
of raw DFTracer traces. The tool indexes the inputs, aggregates them into the
internal AGGREGATION column family (DDSketch forced on), fits per-component
distributions, refines max_bound against an internal barrier simulator, and
emits a DLIO train.computation_time + reader.preprocess_time block. The
user does not need to run dftracer_aggregator separately.
Required input event names: cat=dataloader with name=fetch.block /
fetch.iter, and cat=data with name=preprocess / item. The tool
exits non-zero with an explanatory message if no DLIO events are present.
Usage:
dftracer_gen_dlio_config [OPTIONS] -o <config.yaml>
Options:
-d, --directory <path>- Input directory containing .pfw or .pfw.gz traces (default: .)-o, --output <path>- Output path for the DLIO YAML config [required]--max-bound-percentile <pct>- Initial max_bound percentile, 0-100 (default: 95)--simulation-iterations <n>- Max simulator iterations for percentile refinement (default: 5)--target-e2e-error <frac>- Target relative E2E error to declare convergence (default: 0.05)--target-cdf-similarity <frac>- Target fetch_block CDF similarity (default: 0.90)--patience <n>- Early-stop after this many iterations without improvement (default: 10)--epsilon <step>- Base step size for percentile adjustment (default: 1.0)--momentum <m>- Momentum factor in [0, 1) (default: 0.9)--min-percentile <pct>- Floor on max_bound percentile during optimization (default: 50)--num-workers <n>- DataLoader worker count for the simulator (default: 8)--prefetch-factor <n>- DataLoader prefetch factor (default: 2)--seed <n>- Base seed for simulator and sampler (default: 42)--max-samples-per-entry <n>- Cap on synthesized samples per aggregation entry; 0 disables (default: 100)-t, --time-interval <ms>- Aggregation time interval in ms (default: 5000)--index-dir <path>- Directory for the shared index store (default: system temp dir)--checkpoint-size <bytes>- Checkpoint size for indexing in bytes (default: 33554432 B / 32 MB)--executor-threads <count>- Number of executor threads for parallel processing-f, --force- Force index recreation
Distribution pool: Each component is fit as the lowest-BIC choice among {Normal, Lognormal, Gamma, Exponential, Weibull, Gaussian Mixture (K=2), Gaussian Mixture (K=3)}. Mixture candidates are only considered when the sample count is at least 20.
Example:
# Generate config from a directory of raw traces
dftracer_gen_dlio_config -d ./traces -o dlio_config.yaml
# Refine harder against the simulator with a tighter convergence target
dftracer_gen_dlio_config -d ./traces -o dlio_config.yaml \
--simulation-iterations 20 --target-e2e-error 0.02 --patience 5
# Reuse a shared index directory across runs to skip re-indexing
dftracer_gen_dlio_config -d ./traces -o dlio_config.yaml \
--index-dir /var/cache/dftracer/idx
Output schema:
train:
computation_time:
type: <normal|lognormal|gamma|exponential|weibull|mixture>
# single distribution: per-family params (mean/stdev, mu/sigma,
# shape/scale, rate)
# mixture: n_components + components: [{weight, params: {type, ...}}]
max_bound: <seconds>
reader:
preprocess_time:
# same structure
Comparing against an external generator: scripts/compare_dlio_yamls.py
diffs two DLIO YAMLs with a tolerance check on parameters and a two-sample
Kolmogorov-Smirnov check on samples drawn from each fit. Run via uv run
scripts/compare_dlio_yamls.py --python <a.yaml> --cpp <b.yaml> (the inline
PEP-723 metadata installs pyyaml and numpy automatically). Same model
family + small KS = the two YAMLs would produce indistinguishable DLIO sample
streams.
dftracer_organize¶
Description: Reorganize traces by routing events to query-based groups with provenance tracking
Usage:
dftracer_organize [OPTIONS] --output <dir> --groups <groups...>
Options:
--files <files...>- Input trace files (.pfw, .pfw.gz)-d, --directory <path>- Directory containing trace files-o, --output <dir>- Output directory [required]--groups <groups...>- Query groups:'io:cat == "POSIX"''compute:cat == "APP"'[required]--chunk-size <MB>- Target chunk size in MB for output files (default: 256)--checkpoint-size <bytes>- Checkpoint size for indexing in bytes (default: 33554432 B / 32 MB)--index-dir <path>- Directory for sidecar files-f, --force- Force rebuild of indices--no-compress- Write plain .pfw instead of .pfw.gz--executor-threads <count>- Worker threads (default: number of CPU cores)
Example:
# Separate I/O and compute operations
dftracer_organize -d ./traces -o ./organized \
--groups 'io:cat == "POSIX"' 'compute:cat == "APP"'
# Create multiple semantic views
dftracer_organize -d ./traces -o ./views \
--groups 'read:name == "read"' 'write:name == "write"' 'other:'
# Keep uncompressed output
dftracer_organize -d ./traces -o ./plain --groups "all:" --no-compress
dftracer_reconstruct¶
Description: Reconstruct original traces from reorganized files using provenance tracking in .pidx sidecars
Usage:
dftracer_reconstruct [OPTIONS] --directory <dir> --output <dir>
Options:
-d, --directory <path>- Directory containing reorganized files [required]-o, --output <dir>- Output directory [required]--index-dir <path>- Directory for sidecar files--checkpoint-size <bytes>- Checkpoint size for indexing in bytes (default: 33554432 B / 32 MB)--no-compress- Write plain .pfw instead of .pfw.gz--executor-threads <count>- Worker threads (default: number of CPU cores)
Example:
# Reconstruct from reorganized directory
dftracer_reconstruct -d ./organized -o ./reconstructed
# Reconstruct without compression
dftracer_reconstruct -d ./views -o ./reconstructed --no-compress
dftracer_replay¶
Description: Replay I/O operations from DFTracer trace files with timing and filtering support
Usage:
dftracer_replay [OPTIONS] <inputs...>
Options:
inputs- Trace files (.pfw, .pfw.gz) or directories containing trace files [required]--no-timing- Ignore original timing and execute as fast as possible--dry-run- Parse and analyze traces without executing operations--dftracer-mode- Use DFTracer sleep-based replay (sleep for operation duration instead of doing actual I/O)--no-sleep- When used with –dftracer-mode, disable sleep calls for maximum speed--verbose- Enable verbose output and detailed statistics-r, --recursive- Recursively search directories for trace files--use-call-tree- Build and use call tree structure for hierarchical replay--hierarchical-replay- Replay operations respecting parent-child call hierarchy (requires –use-call-tree)--respect-call-hierarchy- Replay child nodes immediately after parent (requires –use-call-tree and –hierarchical-replay)--filter-pid <pids>- Only replay events from specific PID(s) (comma-separated)--exclude-pid <pids>- Exclude events from specific PID(s) (comma-separated)--filter-tid <tids>- Only replay events from specific TID(s) (comma-separated)--exclude-tid <tids>- Exclude events from specific TID(s) (comma-separated)--filter-function <funcs>- Only replay specific function(s) (comma-separated, e.g., read,write,open)--exclude-function <funcs>- Exclude specific function(s) (comma-separated)--filter-category <cats>- Only replay specific category/categories (comma-separated, e.g., POSIX,storage)--exclude-category <cats>- Exclude specific category/categories (comma-separated)--start-timestamp <us>- Only replay events after this timestamp (microseconds)--end-timestamp <us>- Only replay events before this timestamp (microseconds)--min-size <bytes>- Only replay operations with size >= this value (bytes)--max-size <bytes>- Only replay operations with size <= this value (bytes)--sample-rate <rate>- Sample rate for replay (0.0-1.0, 1.0=all events, 0.1=10%)--sample-seed <seed>- Random seed for sampling (for reproducibility)--max-events <count>- Maximum number of events to replay (0=unlimited)
Example:
# Replay with original timing
dftracer_replay ./traces/rank_0.pfw.gz
# Dry-run analysis of trace file
dftracer_replay ./traces/rank_0.pfw.gz --dry-run --verbose
# Replay only POSIX read operations
dftracer_replay -d ./traces -r --filter-category POSIX --filter-function read
For detailed usage, see Replay.
dftracer_tar¶
Description: Index and analyze TAR.GZ archives containing DFTracer trace data
Usage:
dftracer_tar [OPTIONS] <file>
Options:
file- TAR.GZ file to process [required]-i, --index <path>- Index file to use (auto-generated if not specified)-c, --checkpoint-size <bytes>- Checkpoint size for indexing in bytes (default: 33554432 B / 32 MB)-f, --force-rebuild- Force rebuild index--list-files- List all files in the TAR archive--info- Show archive information--build-only- Only build the index, don’t perform other operations
Example:
# Show archive information
dftracer_tar trace_archive.tar.gz --info
# List files in archive
dftracer_tar trace_archive.tar.gz --list-files
# Build index for fast access
dftracer_tar trace_archive.tar.gz --build-only
dftracer_gen_fake_trace¶
Description: Generate realistic synthetic DFTracer traces for testing bloom filter indexing
Usage:
dftracer_gen_fake_trace [OPTIONS] --output-dir <dir>
Options:
-o, --output-dir <dir>- Output directory for trace files [required]-p, --num-processes <count>- Number of ranks (default: 8)-H, --num-hosts <count>- Number of hosts (default: 4)-e, --num-epochs <count>- Training epochs (default: 500)-s, --steps-per-epoch <count>- Steps per epoch (default: 1000)--checkpoint-every <n>- Checkpoint every N epochs (default: 5)--validation-every <n>- Validate every N epochs (default: 2)--num-train-files <count>- Training data shards (default: 8)--num-val-files <count>- Validation data shards (default: 2)--step-duration-ms <ms>- Base step duration in milliseconds (default: 100)--seed <seed>- Random seed for duration jitter (default: 42)--verify- After generation, build bloom indices and run queries to verify chunk-skipping works--checkpoint-size <bytes>- Gzip checkpoint size in bytes for indexing (default: 2 MB)
Example:
# Generate synthetic traces for 4 ranks
dftracer_gen_fake_trace -o ./traces -p 4
# Generate with verification of bloom filters
dftracer_gen_fake_trace -o ./traces -p 8 -H 2 --verify
# Generate with custom training parameters
dftracer_gen_fake_trace -o ./traces -e 100 -s 500 --checkpoint-every 10
dftracer_call_tree¶
Description: Build and analyze call trees from DFTracer trace files for hierarchical structure analysis
Usage:
dftracer_call_tree [OPTIONS] <inputs...>
Options:
inputs- Trace files (.pfw, .pfw.gz) or directories containing trace files [required]-r, --recursive- Recursively search directories for trace files--pattern <pattern>- File pattern for trace files (default:*.pfw.gz)-o, --output <path>- Output file path for serialized call tree (auto-generated from input if not specified)--json- Also save call tree in JSON (Chrome Tracing) format--text <path>- Export call tree to text file--max-depth <n>- Maximum depth for tree printing (0=unlimited, default: 0)--analyze- Perform detailed analysis (call patterns, timing, critical path)-v, --verbose- Enable verbose output--stats-only- Only print statistics, skip tree traversal--no-save- Don’t save output files, only print analysis
Example:
# Build call tree from directory
dftracer_call_tree ./traces --analyze
# Export to JSON and text formats
dftracer_call_tree ./traces --json --text tree.txt
# Analyze with detailed statistics
dftracer_call_tree ./traces --analyze --verbose --max-depth 5
dftracer_comparator¶
Description: Compare DFTracer trace metrics between a baseline and a variant run. Produces a hierarchical tree table showing per-category and per-operation deltas with Cohen’s d significance classification.
Usage:
dftracer_comparator [OPTIONS]
Options:
--baseline <path>- Baseline trace file or directory [required unless –config]--variant <path>- Variant trace file or directory [required unless –config]--config <path>- JSON config file for hierarchical comparison (replaces –baseline/–variant)--query <query>- Query DSL filter (default:'cat == "POSIX" OR cat == "STDIO"')--group-by <keys>- Comma-separated group keys (default: cat,name)--format <fmt>- Output format:table(default) orjson-t, --time-interval <ms>- Time interval in milliseconds for bucketing (default: 5000)--threshold <pct>- Hide changes below this percentage (default: 0.0)--no-color- Disable ANSI color output--executor-threads <count>- Number of parallel threads (default: auto)--index-dir <path>- Directory for index sidecar files (default: system temp)--force- Force index rebuild--checkpoint-size <bytes>- Checkpoint size for indexing in bytes (default: 33554432 B / 32 MB)
Example:
# Quick comparison of two trace files
dftracer_comparator --baseline run_v1.pfw.gz --variant run_v2.pfw.gz
# Compare directories with 1-second buckets
dftracer_comparator --baseline ./traces_v1 --variant ./traces_v2 -t 1000
# JSON output for programmatic consumption
dftracer_comparator --baseline run_v1.pfw.gz --variant run_v2.pfw.gz --format json
# Filter to specific operations
dftracer_comparator --baseline a.pfw.gz --variant b.pfw.gz \
--query 'cat == "POSIX" AND name == "write"'
# Hierarchical comparison via JSON config
dftracer_comparator --config compare.json
Output columns:
Baseline / Variant - Metric values for each side
Delta - Absolute difference (variant - baseline)
Pct - Percentage change
Sig - Cohen’s d significance:
NEGLIGIBLE,SMALL,MEDIUM,LARGE
JSON config format:
{
"baseline": "./traces_v1",
"variant": "./traces_v2",
"defaults": {
"time_interval_ms": 5000,
"threshold_pct": 1.0,
"percentiles": [0.5, 0.95, 0.99]
},
"nodes": [
{
"name": "POSIX I/O",
"query": "cat == \"POSIX\"",
"children": [
{"name": "reads", "query": "name == \"read\""},
{"name": "writes", "query": "name == \"write\""}
]
}
]
}
dftracer_aggregator_mpi¶
Description: MPI driver for the distributed-SST aggregator. Each rank
produces per-rank aggregation SSTs; rank 0 bulk-ingests and the ranks jointly
write the final gzip JSON output. Requires the build to be configured with
DFTRACER_UTILS_ENABLE_MPI=ON.
The pipeline is structured as a five-task DAG executed inside the standard
Pipeline runtime:
scan -> phase_a -> phase_b -> phase_c -> merge
scan - Cooperative gzip-member pre-scan,
Allgathervof the member map, and deterministic Longest-Processing-Time (LPT) assignment of work units to ranks.phase_a - Each rank runs the distributed-SST indexer + aggregation visitor on its slice and writes SSTs (and
tracker.bin) to its rank staging directory. SSTs are optionally moved to a shared-FS staging root for the coordinator.phase_b - Rank 0
Gathervof artifact lists and a singleIndexDatabase::bulk_ingest+ tracker merge.phase_c - Each rank writes a shard-prefixed Perfetto gzip JSON slice using
PerfettoTraceWriterUtility.merge - Parallel
pwriteon Lustre-striped output or serial concatenation otherwise.
Usage:
mpirun -n <N> dftracer_aggregator_mpi [OPTIONS]
Options:
-d, --directory <path>- Input directory containing .pfw or .pfw.gz files (default:.)-o, --output <path>- Output gzip JSON path..gzis appended if missing (default:aggregated_output.json.gz)-t, --time-interval <ms>- Time interval in milliseconds for bucketing (default: 5000)--staging-dir <path>- Per-rank SST staging root. Defaults to<index_dir>/_staging; each rank writes to<staging_dir>/rank_<R>.--shared-staging <path>- Shared-FS staging root. When set and different from--staging-dir, each rank moves its SSTs andtracker.binfrom the (node-local) staging dir to<shared-staging>/rank_<R>before the coordinator ingest. Required for multi-node runs where--staging-dirpoints at node-local NVMe.--keep-staging- Keep per-rank SST staging dirs after a successful ingest
This binary also accepts the shared Shared CLI Flags (Pipeline and
Indexing schemas). Per-rank --executor-threads / --io-threads are
automatically scaled down by the detected processes-per-node count so
co-located ranks do not oversubscribe cores.
Example:
# 16 ranks on one node, node-local staging
mpirun -n 16 dftracer_aggregator_mpi -d ./traces -o agg.json.gz
# Multi-node run with shared staging on Lustre
mpirun -n 64 dftracer_aggregator_mpi -d /lustre/traces \
--staging-dir /local/nvme/_staging \
--shared-staging /lustre/scratch/_staging \
-o /lustre/out/agg.json.gz
dftracer_call_tree_mpi¶
Description: MPI driver for parallel call-tree construction. Each rank
owns a slice of PIDs, emits a Chrome Tracing JSON shard, and rank 0 merges
the shards. Wraps the MPICallTreeBuilder engine
(discover_pids -> build -> hierarchy -> write -> merge coro phases).
Requires DFTRACER_UTILS_ENABLE_MPI=ON.
Usage:
mpirun -n <N> dftracer_call_tree_mpi [OPTIONS] <input>
Options:
input- Input directory containing trace files [required]-o, --output <path>- Output JSON path (default:call_tree.pfw)--staging-dir <path>- Shared-FS staging root for per-rank shards (default:<output>.shards/)--gzip- gzip the merged output (.gzappended if needed)-v, --verbose- Verbose progress logging--keep-staging- Keep per-rank shard files after merge
This binary also accepts the shared Shared CLI Flags (Pipeline); per-rank thread counts are scaled down by the detected processes-per-node count.
Example:
# 32 ranks across nodes; gzip merged output
mpirun -n 32 dftracer_call_tree_mpi ./traces -o call_tree.pfw --gzip