.. dftracer utilities documentation master file
Welcome to dftracer utilities documentation!
============================================
**dftracer utilities** is a collection of utilities for `DFTracer `_,
providing powerful tools for trace file reading, indexing, and processing. The library includes
both C++ APIs and Python bindings for flexible integration.
Features
--------
- **High-performance trace file reading**: Efficient reading of compressed trace files
- **Arrow data interchange**: Columnar Arrow output via nanoarrow for zero-copy access from pyarrow, polars, and DuckDB
- **Utility bindings**: Python bindings for statistics, views, aggregation, bloom queries, and reorganization
- **Indexing capabilities**: Fast indexing and searching of trace data with bloom filters
- **Pipeline processing**: Parallel data processing with tasks, coroutines, and channels
- **Arrow IPC file output**: Write results as Arrow IPC files for pyarrow, polars, and DuckDB
- **Task graphs**: DAG-based workflow builder with fan-out, fan-in, map, reduce patterns
- **Python bindings**: Easy-to-use Python interface
- **Cross-platform**: Works on Linux, macOS, and other Unix-like systems
.. toctree::
:maxdepth: 2
:caption: Contents:
installation
quickstart
pipeline
cli
server
utilities
api/index
cpp_api/index
developers
.. toctree::
:maxdepth: 1
:caption: Links:
DFTracer Documentation
DFTracer GitHub
Getting Started
---------------
To get started with dftracer utilities, check out the :doc:`installation` guide
and then follow the :doc:`quickstart` tutorial.
Installation
~~~~~~~~~~~~
.. code-block:: bash
pip install dftracer-utils
For more detailed installation instructions, see :doc:`installation`.
Quick Example
~~~~~~~~~~~~~
.. code-block:: python
from dftracer.utils import TraceReader
# Read a trace file (auto-detects index sidecar)
reader = TraceReader("path/to/trace.pfw.gz")
# Read all lines as JSON
for obj in reader.iter_lines_json():
print(obj["name"], obj["dur"])
# Read as Arrow for columnar access
table = reader.read_arrow()
df = table.to_pandas() # requires pyarrow
# Aggregate traces in a directory
from dftracer.utils.utilities import AggregatorUtility
agg = AggregatorUtility()
table = agg.process("./traces", time_interval_ms=1000.0)
# Include extra per-event numeric fields as Arrow columns
table = agg.process("./traces", custom_metric_fields=["bytes"])
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`