Call Tree Utility¶

The call tree utility builds hierarchical call trees from DFTracer trace files. It analyzes trace files to reconstruct calling relationships between functions, creating a tree structure that represents the execution flow.

#include <dftracer/utils/call_tree/call_tree.h>

Overview¶

The Call Tree utility is designed to perform the following tasks:

Parse plain text or gzipped DFTracer trace files (.pfw, .pfw.gz) and extract function call information
Build hierarchical call trees showing parent-child relationships between function calls
Support distributed processing using MPI for handling large-scale trace datasets
Serialize call trees in multiple formats (binary, JSON/Chrome Tracing)
Provide statistical analysis of call patterns and execution timings

        graph LR
    Input["Trace Files<br/>(.pfw, .pfw.gz)"] --> Parse["Parse Events"]
    Parse --> Build["Build Call Tree"]
    Build --> Tree["CallTree"]
    Tree --> Stats["Statistics<br/>(CallTreeStats)"]
    Tree --> DFS["Depth-First<br/>Traversal"]
    Tree --> Serial["Serialize"]
    Serial --> Bin["Binary (.calltree)"]
    Serial --> JSON["JSON (Chrome Tracing)"]
    Serial --> Txt["Text"]

Types¶

// Node information in the call tree
struct CallTreeNodeInfo {
    std::uint64_t id;
    std::string name;
    std::string category;
    std::uint64_t start_time_us;
    std::uint64_t duration_us;
    int level;
    std::uint64_t parent_id;
    size_t num_children;
    std::vector<std::uint64_t> children_ids;
    std::unordered_map<std::string, std::string> args;
};

// Aggregate statistics
struct CallTreeStats {
    size_t total_nodes;
    size_t num_levels;
    size_t num_leaf_nodes;
    size_t num_processes;
    int max_depth;
    std::vector<double> avg_time_per_level_us;
    std::vector<size_t> nodes_per_level;
};

CallTree¶

Build and inspect a call tree:

#include <dftracer/utils/call_tree/call_tree.h>

using namespace dftracer::utils::call_tree;

CallTree tree;

// Load trace files from a directory
tree.load_from_directory("/path/to/traces", "*.pfw.gz");

// Generate the call tree structure
tree.generate();

// Print statistics
tree.print_statistics();

// Print tree in depth-first order (limit to 3 levels)
tree.print_depth_first(3);

Traverse nodes programmatically:

// Get all nodes in depth-first order
auto nodes = tree.get_nodes_depth_first();

for (const auto& node : nodes) {
    printf("[level=%d] %s (%s) duration=%.3fms\n",
           node.level,
           node.name.c_str(),
           node.category.c_str(),
           static_cast<double>(node.duration_us) / 1000.0);
}

Query by process and thread:

// Get all process IDs in the tree
auto pids = tree.get_process_ids();

for (auto pid : pids) {
    // Get thread IDs for this process
    auto tids = tree.get_thread_ids(pid);

    for (auto tid : tids) {
        // Get root nodes for this process/thread
        auto roots = tree.get_root_nodes(pid, tid);
        printf("PID %u, TID %u: %zu root nodes\n",
               pid, tid, roots.size());
    }
}

Look up a specific node:

auto node = tree.get_node_by_id(42);
printf("Node %lu: %s, %zu children\n",
       node.id, node.name.c_str(), node.num_children);

// Access node arguments (pid, tid, fhash, etc.)
for (const auto& [key, value] : node.args) {
    printf("  %s = %s\n", key.c_str(), value.c_str());
}

Serialization¶

Serialization moved to coroutine-based save_binary / save_arrow free functions in dftracer/utils/call_tree/mpi/serializable.h. The legacy CallTree::save_to_file / save_to_json / load_from_file methods have been removed; the API now exposes CallTree::internal_tree() for direct access to the underlying internal::CallTree consumed by the save/load coroutines.

Save to binary format:

The custom binary format uses a shared string dictionary (name, category, arg keys / string values share storage) and preserves typed args (int / uint / double / bool instead of flattening to strings).

#include <dftracer/utils/call_tree/call_tree.h>
#include <dftracer/utils/call_tree/mpi/serializable.h>
#include <dftracer/utils/core/pipeline/pipeline.h>
#include <dftracer/utils/core/tasks/task.h>

using namespace dftracer::utils;
using namespace dftracer::utils::call_tree;

CallTree tree;
tree.load_from_directory("/path/to/traces", "*.pfw.gz");
tree.generate();

auto task = make_task(
    [&tree](CoroScope& scope) -> coro::CoroTask<void> {
        co_await save_binary(&scope, tree.internal_tree(),
                             "output.calltree");
        co_return;
    },
    "save_binary");

Pipeline pipeline(PipelineConfig().with_name("calltree-save"));
pipeline.set_source({task});
pipeline.execute();

Save to Arrow IPC (.arrow):

Columnar Arrow IPC with buffer-level zstd compression and dictionary-encoded name / category columns. Readable directly by pyarrow, polars, nanoarrow, and DuckDB. Requires the build to be configured with DFTRACER_UTILS_ENABLE_ARROW_IPC=ON.

auto task = make_task(
    [&tree](CoroScope& scope) -> coro::CoroTask<void> {
        co_await save_arrow(&scope, tree.internal_tree(),
                            "output.arrow");
        co_return;
    },
    "save_arrow");

Load a previously saved tree:

Both loaders are coroutines that return a fresh internal::CallTree:

auto task = make_task([](CoroScope& scope) -> coro::CoroTask<void> {
    auto loaded = co_await load_binary(&scope, "output.calltree");
    // or: auto loaded = co_await load_arrow(&scope, "output.arrow");
    printf("Loaded tree: %zu nodes\n", loaded->num_nodes());
    co_return;
}, "load");

Save to text file (still available on the high-level API):

tree.print_depth_first_to_file("output.txt", 5);  // Max depth 5

Output Formats¶

Binary Format (`.calltree`)¶

Compact custom format with a global string dictionary and typed args; best for round-tripping trees between dftracer-utils runs (for example, between a coordinator and downstream MPI ranks). Backed by the CALLTREE_BINARY_VERSION = 2 header.

Arrow IPC (`.arrow`)¶

Columnar Arrow IPC file with zstd buffer compression and dictionary-encoded name / category columns. Best for analysis pipelines that already speak Arrow (pyarrow, polars, DuckDB, nanoarrow).

JSON Format (Chrome Tracing)¶

The dftracer_call_tree CLI emits Chrome Tracing JSON (gzipped with --gzip) suitable for chrome://tracing and Perfetto UI. Programmatic JSON export is no longer exposed on the CallTree C++ API.

Text Format¶

Human-readable text with indentation showing hierarchical structure, function names, categories, and timing at each level.

Performance Considerations¶

Index Files: The utility creates index files for compressed traces to enable efficient random access. These are cached and reused across runs.
Checkpoint Size: Larger checkpoint sizes reduce index file size but may increase memory usage during processing. Default is 32 MB.
Thread Count: Each MPI rank can use multiple threads for parallel processing within the rank. Balance thread count with available cores.
PID Distribution: PIDs are distributed across MPI ranks. More ranks enable processing more PIDs in parallel, but increase communication overhead during the gather phase.