Welcome to dftracer utilities documentation!¶
dftracer utilities is a collection of utilities for DFTracer, providing powerful tools for trace file reading, indexing, and processing. The library includes both C++ APIs and Python bindings for flexible integration.
Features¶
High-performance trace file reading: Efficient reading of compressed trace files
Arrow data interchange: Columnar Arrow output via nanoarrow for zero-copy access from pyarrow, polars, and DuckDB
Utility bindings: Python bindings for statistics, views, aggregation, bloom queries, and reorganization
Indexing capabilities: Fast indexing and searching of trace data with bloom filters
Pipeline processing: Parallel data processing with tasks, coroutines, and channels
Arrow IPC file output: Write results as Arrow IPC files for pyarrow, polars, and DuckDB
Task graphs: DAG-based workflow builder with fan-out, fan-in, map, reduce patterns
Python bindings: Easy-to-use Python interface
Cross-platform: Works on Linux, macOS, and other Unix-like systems
Contents:
- Installation
- Quick Start Guide
- Pipeline Guide
- Overview
- Basic Task Creation
- Pipeline Configuration and Execution
- Pipeline Execution
- CoroScope and Structured Concurrency
- Coroutine Combinators
- Producer-Consumer Pattern with Channels
- Fan-Out and Fan-In Patterns
- Parallel Execution with when_all/when_any
- Lazy Sequences and Async Generators
- Multi-Level Parallelism
- Multi-Stage Channel Pipelines
- Error Handling
- Timeouts and Cancellation
- TaskGraph for DAGs
- Migrating from Old Pipeline API (TaskContext/TaskScope)
- Pipelined Replay
- Memory Budget Control for Streaming Iterators
flush_every_filesfor Batched Index Writes- API Reference
- Command-Line Tools
- Shared CLI Flags
- dftracer_reader
- dftracer_info
- dftracer_merge
- dftracer_split
- dftracer_event_count
- dftracer_pgzip
- dftracer_server
- dftracer_stats
- dftracer_view
- dftracer_index
- dftracer_aggregator
- dftracer_gen_dlio_config
- dftracer_organize
- dftracer_reconstruct
- dftracer_replay
- dftracer_tar
- dftracer_gen_fake_trace
- dftracer_call_tree
- dftracer_comparator
- dftracer_aggregator_mpi
- dftracer_call_tree_mpi
- HTTP Server
- Utilities
- Python API Reference
- C++ API Reference
- Developer’s Guide
Getting Started¶
To get started with dftracer utilities, check out the Installation guide and then follow the Quick Start Guide tutorial.
Installation¶
pip install dftracer-utils
For more detailed installation instructions, see Installation.
Quick Example¶
from dftracer.utils import TraceReader
# Read a trace file (auto-detects index sidecar)
reader = TraceReader("path/to/trace.pfw.gz")
# Read all lines as JSON
for obj in reader.iter_lines_json():
print(obj["name"], obj["dur"])
# Read as Arrow for columnar access
table = reader.read_arrow()
df = table.to_pandas() # requires pyarrow
# Aggregate traces in a directory
from dftracer.utils.utilities import AggregatorUtility
agg = AggregatorUtility()
table = agg.process("./traces", time_interval_ms=1000.0)
# Include extra per-event numeric fields as Arrow columns
table = agg.process("./traces", custom_metric_fields=["bytes"])