Configuration

This reference guide documents all the configuration parameters available in DFAnalyzer. These parameters control the behavior of analyzers, outputs, and cluster configurations.

Using Configuration Parameters

DFAnalyzer uses Hydra for configuration management, which provides a flexible way to organize and override parameters. You can specify parameters in several ways:

Command line overrides:

dfanalyzer trace_path=path/to/traces analyzer/preset=dlio

Group selection:

dfanalyzer analyzer=dftracer cluster=slurm output=csv

Nested parameters:

dfanalyzer output.compact=true cluster.n_workers=4

Core Parameters

Parameter	Type	Default	Description
`trace_path`	string	Required	Path to the I/O trace data for analysis.
`view_types`	list[str]	`["file_name", "proc_name", "time_range"]`	A list of perspectives to analyze the data from.
`debug`	bool	`false`	Enable debug mode with more verbose output.
`verbose`	bool	`false`	Enable verbose information display.

Analyzer Configuration

DFAnalyzer supports multiple analyzers for different trace formats. You can select an analyzer using the analyzer=<type> parameter. The default analyzer is dftracer.

Common Analyzer Parameters

Parameter	Type	Default	Description
`analyzer.checkpoint`	bool	`true`	Enable checkpointing of analysis state.
`analyzer.checkpoint_dir`	string	`${hydra:runtime.output_dir}/checkpoints`	Directory for saving checkpoints.
`analyzer.time_approximate`	bool	`true`	Use approximate time for analysis.
`analyzer.preset`	group	`posix`	Select a preset configuration for the analyzer.
`analyzer.time_granularity`	float	Varies	Time granularity for analysis (in seconds). Defaults vary by analyzer.
`analyzer.time_resolution`	float	Varies	Time resolution for the analyzer (in nanoseconds). Defaults vary by analyzer.

Analyzer Presets (`analyzer/preset`)

Presets provide pre-configured settings for different analysis scenarios, making it easier to analyze common workloads without manual configuration. They define how raw trace data is structured into layers and what metrics are derived.

Available presets:

posix (default): For general-purpose POSIX I/O workloads.
dlio: For workloads generated by the DLIO benchmark.

You can select a preset like this:

dfanalyzer analyzer/preset=dlio

DFTracer Analyzer (`analyzer=dftracer`)

For analyzing DFTracer trace files. This is the default analyzer.

Parameter	Type	Default	Description
`analyzer.time_granularity`	float	1	Time granularity for DFTracer (in seconds).
`analyzer.time_resolution`	float	1e6	Time resolution for DFTracer (in nanoseconds).

Darshan Analyzer (`analyzer=darshan`)

For analyzing Darshan DXT trace files.

Parameter	Type	Default	Description
`analyzer.time_granularity`	float	1	Time granularity for Darshan (in seconds).
`analyzer.time_resolution`	float	1e3	Time resolution for Darshan (in nanoseconds).

Recorder Analyzer (`analyzer=recorder`)

For analyzing Recorder trace files.

Parameter	Type	Default	Description
`analyzer.time_granularity`	float	1	Time granularity for Recorder (in seconds).
`analyzer.time_resolution`	float	1e7	Time resolution for Recorder (in nanoseconds).

Output Configuration

Control how analysis results are presented and stored. You can select an output format using output=<type>.

Common Output Parameters

Parameter	Type	Default	Description
`output.compact`	bool	`false`	Use compact output format.
`output.name`	string	“”	Custom name for the output.
`output.root_only`	bool	`true`	Only show output on the root process.

Console Output (`output=console`)

Prints the analysis summary directly to the console. This is the default output.

CSV Output (`output=csv`)

Saves the analysis results to a set of CSV files in the output directory.

SQLite Output (`output=sqlite`)

Saves the analysis results to a SQLite database file.

Parameter	Type	Default	Description
`output.run_db_path`	string	“”	Path to the SQLite database file. If empty, a new file is created.

Cluster Configuration

Configure settings for running DFAnalyzer in a distributed environment using Dask. Select a cluster type with cluster=<type>.

Local Cluster (`cluster=local`)

The default cluster configuration, which runs analysis on the local machine.

Parameter	Type	Default	Description
`cluster.n_workers`	int	null	Number of Dask workers. Defaults to the number of CPU cores.
`cluster.memory_limit`	int	null	Memory limit per worker (e.g., “8GB”).

Slurm Cluster (`cluster=slurm`)

For environments using the Slurm workload manager.

Parameter	Type	Default	Description
`cluster.processes`	int	1	Number of Slurm jobs (nodes) to request.
`cluster.cores`	int	16	Cores per job.
`cluster.memory`	string	null	Memory per job (e.g., “24GB”).
`cluster.job_extra_directives`	list[str]	[]	Additional directives for the Slurm job script (e.g., `["--account=<name>", "--partition=<name>"]`).

LSF Cluster (`cluster=lsf`)

For environments using the LSF workload manager.

Configuration

Using Configuration Parameters

Core Parameters

Analyzer Configuration

Common Analyzer Parameters

Analyzer Presets (analyzer/preset)

DFTracer Analyzer (analyzer=dftracer)

Darshan Analyzer (analyzer=darshan)

Recorder Analyzer (analyzer=recorder)

Output Configuration

Common Output Parameters

Console Output (output=console)

CSV Output (output=csv)

SQLite Output (output=sqlite)

Cluster Configuration

Local Cluster (cluster=local)

Slurm Cluster (cluster=slurm)

LSF Cluster (cluster=lsf)

PBS Cluster (cluster=pbs)