DLIO Config Generation¶
The dlio utilities power the dftracer_gen_dlio_config binary
(see dftracer_gen_dlio_config in the CLI reference).
They consume an already-populated AGGREGATION column family (produced by
Command-Line Tools dftracer_aggregator or by the shared
aggregation_runner library function) and emit a DLIO-compatible YAML
config describing per-component timing distributions.
#include <dftracer/utils/utilities/dlio/barrier_simulator.h>
#include <dftracer/utils/utilities/dlio/optimizer.h>
#include <dftracer/utils/utilities/dlio/statistic.h>
#include <dftracer/utils/utilities/dlio/trace_loader.h>
#include <dftracer/utils/utilities/dlio/worker_queue.h>
#include <dftracer/utils/utilities/dlio/yaml_emit.h>
Pipeline overview¶
End-to-end the module composes four pieces:
trace_loader opens an existing RocksDB read-only (with the AGGREGATION merge operator re-attached), iterates the
AGGREGATIONcolumn family, groups entries by(cat, name, pid, time_bucket), and synthesizes a flat per-rank sample stream per component (fetch.block,fetch.iter,preprocess,item). Sketches are used for inverse-CDF sampling when the aggregator was run with--compute-percentiles; otherwise the per-call mean is replicated.The distribution fitter (under
common/statistics/distributions.handmixture.h) fits the lowest-BIC model from {Normal, Lognormal, Gamma, Exponential, Weibull, GMM-2, GMM-3}.BarrierSimulator simulates one DLIO training run across the captured ranks/steps using the fitted distribution as the
fetch.blocksampler. It produces an end-to-end duration, rank variance, and a Kolmogorov-Smirnov similarity between simulated and tracefetch.blocksamples.optimizer runs a sequential momentum loop tuning the
max_boundpercentile used to clamp the sampler, minimizing simulator E2E error while keeping the CDF similarity above a target.yaml_emit renders the final YAML.
trace_loader¶
TraceLoaderOptions opts;
opts.max_samples_per_entry = 100; // cap per (pid, bucket) entry, 0 = unlimited
opts.seed = 0xD15710; // seed for inverse-CDF sampling
AggregatedTraces traces = load_aggregated_traces(db_path, opts);
if (!traces.any_data) { /* no DLIO events in this DB */ }
if (!traces.sketches_available) {
// The aggregator was run without --compute-percentiles. We fell back to
// mean replication; rerun with the flag for higher-fidelity output.
}
Returns an AggregatedTraces with:
fetch_block_trace/fetch_iter_trace/getitem_trace— per-rankstd::vector<std::vector<double>>of seconds, in pid-ascending then time-bucket-ascending order.computation_times/preprocess_times— flat sample arrays in seconds (the input tofit_all_single_distributions).fetch_block_stats/fetch_iter_stats/preprocess_stats/getitem_stats— Statistic objects, with merged DDSketches attached when available.trace_e2e_durationand per-componentComponentTimeMetricswith bothaccumulated_time(sum ofcount × mean) andunion_time(true wall-clock union viasweep_unionover per-entry(ts, te)boundaries).
BarrierSimulator¶
BarrierSimulatorContext ctx = make_simulator_context(
traces, /*num_workers=*/8, /*prefetch_factor=*/2);
auto sampler = make_sampler(fitted_model); // from common/statistics
BarrierSimulator sim;
BarrierSimulationResult result = sim.simulate(
ctx, /*base_seed=*/42, sampler);
printf("e2e=%.3fs error=%.2f%% fetch_block_cdf_sim=%.4f\n",
result.e2e_duration,
result.e2e_error * 100.0,
result.fetch_block_cdf_similarity);
Free helpers exposed alongside BarrierSimulator:
sweep_union(boundaries)— sweep-line interval union, microseconds to seconds.cdf_similarity(a, b)—1 − KSbetween two empirical samples.variance(values)— population variance.
Distribution fitting¶
Lives under common/statistics and works on any sample array, not just DLIO
traces — see Common for FittedDistribution, FittedMixture,
BestModel (the std::variant), select_best_model, make_sampler,
and free pdf / cdf / quantile overloads.
optimizer¶
OptimizerOptions opt_opts;
opt_opts.max_iterations = 5;
opt_opts.target_e2e_error = 0.05;
opt_opts.target_cdf_similarity = 0.90;
opt_opts.patience = 10;
opt_opts.epsilon = 1.0;
opt_opts.momentum = 0.9;
opt_opts.min_percentile = 50.0;
opt_opts.initial_percentile = 95.0;
opt_opts.base_seed = 42;
OptimizerResult opt = optimize_max_bound_percentile(
ctx, fitted_model, traces.computation_times, opt_opts);
double max_bound = percentile(sorted_samples, opt.best_percentile);
Each iteration constructs a fresh sampler clamped at
percentile(sample_times, current_percentile), runs simulate(), and
adjusts current_percentile by a momentum-smoothed step proportional to the
E2E error sign. Convergence: e2e_error < target_e2e_error AND
fetch_block_cdf_similarity > target_cdf_similarity. Early-stops after
patience iterations without improvement.
yaml_emit¶
DlioTimingBlock comp{best_comp_model, comp_max_bound};
DlioTimingBlock prep{best_prep_model, prep_max_bound};
std::ofstream out("dlio_config.yaml");
write_dlio_yaml(out, &comp, &prep);
// Or render to a string:
std::string yaml = render_dlio_yaml(&comp, &prep);
Renders both single distributions and mixtures into the DLIO schema
(type: <family> + family-specific params, or type: mixture +
n_components + components: [{weight, params: {...}}]). Pass nullptr
to either block argument to omit it.