Welcome to dftracer utilities documentation!

dftracer utilities is a collection of utilities for DFTracer, providing powerful tools for trace file reading, indexing, and processing. The library includes both C++ APIs and Python bindings for flexible integration.

Features

  • High-performance trace file reading: Efficient reading of compressed trace files

  • Arrow data interchange: Columnar Arrow output via nanoarrow for zero-copy access from pyarrow, polars, and DuckDB

  • Utility bindings: Python bindings for statistics, views, aggregation, bloom queries, and reorganization

  • Indexing capabilities: Fast indexing and searching of trace data with bloom filters

  • Pipeline processing: Parallel data processing with tasks, coroutines, and channels

  • Arrow IPC file output: Write results as Arrow IPC files for pyarrow, polars, and DuckDB

  • Task graphs: DAG-based workflow builder with fan-out, fan-in, map, reduce patterns

  • Python bindings: Easy-to-use Python interface

  • Cross-platform: Works on Linux, macOS, and other Unix-like systems

Contents:

Getting Started

To get started with dftracer utilities, check out the Installation guide and then follow the Quick Start Guide tutorial.

Installation

pip install dftracer-utils

For more detailed installation instructions, see Installation.

Quick Example

from dftracer.utils import TraceReader

# Read a trace file (auto-detects index sidecar)
reader = TraceReader("path/to/trace.pfw.gz")

# Read all lines as JSON
for obj in reader.iter_lines_json():
    print(obj["name"], obj["dur"])

# Read as Arrow for columnar access
table = reader.read_arrow()
df = table.to_pandas()  # requires pyarrow

# Aggregate traces in a directory
from dftracer.utils.utilities import AggregatorUtility
agg = AggregatorUtility()
table = agg.process("./traces", time_interval_ms=1000.0)

# Include extra per-event numeric fields as Arrow columns
table = agg.process("./traces", custom_metric_fields=["bytes"])

Indices and tables