DFTracer Service

Overview

The DFTracer service binary is built as dftracer_service (from src/dftracer/service/service.cpp) and runs as:

dftracer_service <start|stop> [log_dir]

Required environment variables:

  • DFTRACER_ENABLE=1

  • DFTRACER_LOG_FILE=<output_prefix>

Useful optional variable:

  • DFTRACER_TRACE_INTERVAL_MS=<milliseconds> (default is 1000)

  • DFTRACER_LIBUV_THREADS=<count> (default is 1)

Optional YAML key (when using DFTRACER_CONFIGURATION):

  • tracer.libuv_threads

  • profiler.libuv_threads

The service appends hostname information to DFTRACER_LOG_FILE and writes one PID file per service process at <log_dir>/dftracer_server.pid.

Single-node quick start

export DFTRACER_ENABLE=1
export DFTRACER_LOG_FILE=/path/to/output/dftracer-service
export DFTRACER_TRACE_INTERVAL_MS=1000
export DFTRACER_LIBUV_THREADS=1

# Start in daemon mode
dftracer_service start /tmp/dftracer_service

# Stop
dftracer_service stop /tmp/dftracer_service

Multi-node notes

When launching on multiple nodes, use a per-node log_dir so PID files do not conflict on shared filesystems. A safe pattern is to include hostname in the directory path.

If your binary is not in PATH, point SERVICE_BIN to either:

  • <build_dir>/bin/dftracer_service

  • <install_prefix>/bin/dftracer_service

Examples below assume:

export SERVICE_BIN=dftracer_service
export DFTRACER_ENABLE=1
export DFTRACER_LOG_FILE=/path/to/output/dftracer-service
export DFTRACER_TRACE_INTERVAL_MS=1000
export DFTRACER_LIBUV_THREADS=1

Run on multiple nodes with mpirun

Start one service process per rank/node:

mpirun -np 4 bash -lc '
  node_tag=${OMPI_COMM_WORLD_RANK:-${PMI_RANK:-0}}
  node_name=$(hostname -s)
  log_dir=/tmp/dftracer_service_${node_name}_${node_tag}
  mkdir -p "${log_dir}"
  "${SERVICE_BIN}" start "${log_dir}"
'

Stop all service processes:

mpirun -np 4 bash -lc '
  node_tag=${OMPI_COMM_WORLD_RANK:-${PMI_RANK:-0}}
  node_name=$(hostname -s)
  log_dir=/tmp/dftracer_service_${node_name}_${node_tag}
  "${SERVICE_BIN}" stop "${log_dir}"
'

Run on multiple nodes with flux run

Start one service process per task:

flux run -N 4 -n 4 bash -lc '
  node_tag=${FLUX_TASK_RANK:-0}
  node_name=$(hostname -s)
  log_dir=/tmp/dftracer_service_${node_name}_${node_tag}
  mkdir -p "${log_dir}"
  "${SERVICE_BIN}" start "${log_dir}"
'

Stop all service processes:

flux run -N 4 -n 4 bash -lc '
  node_tag=${FLUX_TASK_RANK:-0}
  node_name=$(hostname -s)
  log_dir=/tmp/dftracer_service_${node_name}_${node_tag}
  "${SERVICE_BIN}" stop "${log_dir}"
'

Run on multiple nodes with srun

Start one service process per task:

srun -N 4 -n 4 bash -lc '
  node_tag=${SLURM_PROCID:-0}
  node_name=${SLURMD_NODENAME:-$(hostname -s)}
  log_dir=/tmp/dftracer_service_${node_name}_${node_tag}
  mkdir -p "${log_dir}"
  "${SERVICE_BIN}" start "${log_dir}"
'

Stop all service processes:

srun -N 4 -n 4 bash -lc '
  node_tag=${SLURM_PROCID:-0}
  node_name=${SLURMD_NODENAME:-$(hostname -s)}
  log_dir=/tmp/dftracer_service_${node_name}_${node_tag}
  "${SERVICE_BIN}" stop "${log_dir}"
'

Troubleshooting

  • If startup fails, verify DFTRACER_LOG_FILE is set and writable.

  • Check <log_dir>/dftracer_server.out and <log_dir>/dftracer_server.err on each node.

  • If stop reports no running server, verify you are using the same log_dir path used for start on that node.