Call Tree Utility¶
The call tree utility builds hierarchical call trees from DFTracer trace files. It analyzes trace files to reconstruct calling relationships between functions, creating a tree structure that represents the execution flow.
#include <dftracer/utils/call_tree/call_tree.h>
Overview¶
The Call Tree utility is designed to perform the following tasks:
Parse plain text or gzipped DFTracer trace files (
.pfw,.pfw.gz) and extract function call informationBuild hierarchical call trees showing parent-child relationships between function calls
Support distributed processing using MPI for handling large-scale trace datasets
Serialize call trees in multiple formats (binary, JSON/Chrome Tracing)
Provide statistical analysis of call patterns and execution timings
graph LR
Input["Trace Files<br/>(.pfw, .pfw.gz)"] --> Parse["Parse Events"]
Parse --> Build["Build Call Tree"]
Build --> Tree["CallTree"]
Tree --> Stats["Statistics<br/>(CallTreeStats)"]
Tree --> DFS["Depth-First<br/>Traversal"]
Tree --> Serial["Serialize"]
Serial --> Bin["Binary (.calltree)"]
Serial --> JSON["JSON (Chrome Tracing)"]
Serial --> Txt["Text"]
Types¶
// Node information in the call tree
struct CallTreeNodeInfo {
std::uint64_t id;
std::string name;
std::string category;
std::uint64_t start_time_us;
std::uint64_t duration_us;
int level;
std::uint64_t parent_id;
size_t num_children;
std::vector<std::uint64_t> children_ids;
std::unordered_map<std::string, std::string> args;
};
// Aggregate statistics
struct CallTreeStats {
size_t total_nodes;
size_t num_levels;
size_t num_leaf_nodes;
size_t num_processes;
int max_depth;
std::vector<double> avg_time_per_level_us;
std::vector<size_t> nodes_per_level;
};
CallTree¶
Build and inspect a call tree:
#include <dftracer/utils/call_tree/call_tree.h>
using namespace dftracer::utils::call_tree;
CallTree tree;
// Load trace files from a directory
tree.load_from_directory("/path/to/traces", "*.pfw.gz");
// Generate the call tree structure
tree.generate();
// Print statistics
tree.print_statistics();
// Print tree in depth-first order (limit to 3 levels)
tree.print_depth_first(3);
Traverse nodes programmatically:
// Get all nodes in depth-first order
auto nodes = tree.get_nodes_depth_first();
for (const auto& node : nodes) {
printf("[level=%d] %s (%s) duration=%.3fms\n",
node.level,
node.name.c_str(),
node.category.c_str(),
static_cast<double>(node.duration_us) / 1000.0);
}
Query by process and thread:
// Get all process IDs in the tree
auto pids = tree.get_process_ids();
for (auto pid : pids) {
// Get thread IDs for this process
auto tids = tree.get_thread_ids(pid);
for (auto tid : tids) {
// Get root nodes for this process/thread
auto roots = tree.get_root_nodes(pid, tid);
printf("PID %u, TID %u: %zu root nodes\n",
pid, tid, roots.size());
}
}
Look up a specific node:
auto node = tree.get_node_by_id(42);
printf("Node %lu: %s, %zu children\n",
node.id, node.name.c_str(), node.num_children);
// Access node arguments (pid, tid, fhash, etc.)
for (const auto& [key, value] : node.args) {
printf(" %s = %s\n", key.c_str(), value.c_str());
}
Serialization¶
Serialization moved to coroutine-based save_binary / save_arrow
free functions in dftracer/utils/call_tree/mpi/serializable.h. The
legacy CallTree::save_to_file / save_to_json / load_from_file
methods have been removed; the API now exposes
CallTree::internal_tree() for direct access to the underlying
internal::CallTree consumed by the save/load coroutines.
Save to binary format:
The custom binary format uses a shared string dictionary (name, category,
arg keys / string values share storage) and preserves typed args
(int / uint / double / bool instead of flattening to
strings).
#include <dftracer/utils/call_tree/call_tree.h>
#include <dftracer/utils/call_tree/mpi/serializable.h>
#include <dftracer/utils/core/pipeline/pipeline.h>
#include <dftracer/utils/core/tasks/task.h>
using namespace dftracer::utils;
using namespace dftracer::utils::call_tree;
CallTree tree;
tree.load_from_directory("/path/to/traces", "*.pfw.gz");
tree.generate();
auto task = make_task(
[&tree](CoroScope& scope) -> coro::CoroTask<void> {
co_await save_binary(&scope, tree.internal_tree(),
"output.calltree");
co_return;
},
"save_binary");
Pipeline pipeline(PipelineConfig().with_name("calltree-save"));
pipeline.set_source({task});
pipeline.execute();
Save to Arrow IPC (.arrow):
Columnar Arrow IPC with buffer-level zstd compression and
dictionary-encoded name / category columns. Readable directly by
pyarrow, polars, nanoarrow, and DuckDB. Requires the build to
be configured with DFTRACER_UTILS_ENABLE_ARROW_IPC=ON.
auto task = make_task(
[&tree](CoroScope& scope) -> coro::CoroTask<void> {
co_await save_arrow(&scope, tree.internal_tree(),
"output.arrow");
co_return;
},
"save_arrow");
Load a previously saved tree:
Both loaders are coroutines that return a fresh internal::CallTree:
auto task = make_task([](CoroScope& scope) -> coro::CoroTask<void> {
auto loaded = co_await load_binary(&scope, "output.calltree");
// or: auto loaded = co_await load_arrow(&scope, "output.arrow");
printf("Loaded tree: %zu nodes\n", loaded->num_nodes());
co_return;
}, "load");
Save to text file (still available on the high-level API):
tree.print_depth_first_to_file("output.txt", 5); // Max depth 5
Output Formats¶
Binary Format (.calltree)¶
Compact custom format with a global string dictionary and typed args; best
for round-tripping trees between dftracer-utils runs (for example, between
a coordinator and downstream MPI ranks). Backed by the
CALLTREE_BINARY_VERSION = 2 header.
Arrow IPC (.arrow)¶
Columnar Arrow IPC file with zstd buffer compression and
dictionary-encoded name / category columns. Best for analysis
pipelines that already speak Arrow (pyarrow, polars, DuckDB, nanoarrow).
JSON Format (Chrome Tracing)¶
The dftracer_call_tree CLI emits Chrome Tracing JSON (gzipped with
--gzip) suitable for chrome://tracing and
Perfetto UI. Programmatic JSON export is no
longer exposed on the CallTree C++ API.
Text Format¶
Human-readable text with indentation showing hierarchical structure, function names, categories, and timing at each level.
Performance Considerations¶
- Index Files
The utility creates index files for compressed traces to enable efficient random access. These are cached and reused across runs.
- Checkpoint Size
Larger checkpoint sizes reduce index file size but may increase memory usage during processing. Default is 32 MB.
- Thread Count
Each MPI rank can use multiple threads for parallel processing within the rank. Balance thread count with available cores.
- PID Distribution
PIDs are distributed across MPI ranks. More ranks enable processing more PIDs in parallel, but increase communication overhead during the gather phase.
See Also¶
C++ API Reference - Full C++ API documentation
Command-Line Tools - Command-line tools (
dftracer_call_tree)Replay - Replay utility with call tree integration