Configuration

This reference guide documents all the configuration parameters available in DFAnalyzer. These parameters control the behavior of analyzers, outputs, and cluster configurations.

Using Configuration Parameters

DFAnalyzer uses Hydra for configuration management, which provides a flexible way to organize and override parameters. You can specify parameters in several ways:

  1. Command line overrides:

    dfanalyzer trace_path=path/to/traces analyzer/preset=dlio
    
  2. Group selection:

    dfanalyzer analyzer=dftracer cluster=slurm output=csv
    
  3. Nested parameters:

    dfanalyzer output.compact=true cluster.n_workers=4
    

Core Parameters

Parameter

Type

Default

Description

trace_path

string

Required

Path to the I/O trace data for analysis.

view_types

list[str]

["file_name", "proc_name", "time_range"]

A list of perspectives to analyze the data from.

debug

bool

false

Enable debug mode with more verbose output.

verbose

bool

false

Enable verbose information display.

Analyzer Configuration

DFAnalyzer supports multiple analyzers for different trace formats. You can select an analyzer using the analyzer=<type> parameter. The default analyzer is dftracer.

Common Analyzer Parameters

Parameter

Type

Default

Description

analyzer.checkpoint

bool

true

Enable checkpointing of analysis state.

analyzer.checkpoint_dir

string

${hydra:runtime.output_dir}/checkpoints

Directory for saving checkpoints.

analyzer.time_approximate

bool

true

Use approximate time for analysis.

analyzer.preset

group

posix

Select a preset configuration for the analyzer.

analyzer.time_granularity

float

Varies

Time granularity for analysis (in seconds). Defaults vary by analyzer.

analyzer.time_resolution

float

Varies

Time resolution for the analyzer (in nanoseconds). Defaults vary by analyzer.

Analyzer Presets (analyzer/preset)

Presets provide pre-configured settings for different analysis scenarios, making it easier to analyze common workloads without manual configuration. They define how raw trace data is structured into layers and what metrics are derived.

Available presets:

  • posix (default): For general-purpose POSIX I/O workloads.

  • dlio: For workloads generated by the DLIO benchmark.

You can select a preset like this:

dfanalyzer analyzer/preset=dlio

DFTracer Analyzer (analyzer=dftracer)

For analyzing DFTracer trace files. This is the default analyzer.

Parameter

Type

Default

Description

analyzer.time_granularity

float

1

Time granularity for DFTracer (in seconds).

analyzer.time_resolution

float

1e6

Time resolution for DFTracer (in nanoseconds).

Darshan Analyzer (analyzer=darshan)

For analyzing Darshan DXT trace files.

Parameter

Type

Default

Description

analyzer.time_granularity

float

1

Time granularity for Darshan (in seconds).

analyzer.time_resolution

float

1e3

Time resolution for Darshan (in nanoseconds).

Recorder Analyzer (analyzer=recorder)

For analyzing Recorder trace files.

Parameter

Type

Default

Description

analyzer.time_granularity

float

1

Time granularity for Recorder (in seconds).

analyzer.time_resolution

float

1e7

Time resolution for Recorder (in nanoseconds).

Output Configuration

Control how analysis results are presented and stored. You can select an output format using output=<type>.

Common Output Parameters

Parameter

Type

Default

Description

output.compact

bool

false

Use compact output format.

output.name

string

“”

Custom name for the output.

output.root_only

bool

true

Only show output on the root process.

Console Output (output=console)

Prints the analysis summary directly to the console. This is the default output.

CSV Output (output=csv)

Saves the analysis results to a set of CSV files in the output directory.

SQLite Output (output=sqlite)

Saves the analysis results to a SQLite database file.

Parameter

Type

Default

Description

output.run_db_path

string

“”

Path to the SQLite database file. If empty, a new file is created.

Cluster Configuration

Configure settings for running DFAnalyzer in a distributed environment using Dask. Select a cluster type with cluster=<type>.

Local Cluster (cluster=local)

The default cluster configuration, which runs analysis on the local machine.

Parameter

Type

Default

Description

cluster.n_workers

int

null

Number of Dask workers. Defaults to the number of CPU cores.

cluster.memory_limit

int

null

Memory limit per worker (e.g., “8GB”).

Slurm Cluster (cluster=slurm)

For environments using the Slurm workload manager.

Parameter

Type

Default

Description

cluster.processes

int

1

Number of Slurm jobs (nodes) to request.

cluster.cores

int

16

Cores per job.

cluster.memory

string

null

Memory per job (e.g., “24GB”).

cluster.job_extra_directives

list[str]

[]

Additional directives for the Slurm job script (e.g., ["--account=<name>", "--partition=<name>"]).

LSF Cluster (cluster=lsf)

For environments using the LSF workload manager.

PBS Cluster (cluster=pbs)