Configuration
This reference guide documents all the configuration parameters available in DFAnalyzer. These parameters control the behavior of analyzers, outputs, and cluster configurations.
Using Configuration Parameters
DFAnalyzer uses Hydra for configuration management, which provides a flexible way to organize and override parameters. You can specify parameters in several ways:
Command line overrides:
dfanalyzer trace_path=path/to/traces analyzer/preset=dlio
Group selection:
dfanalyzer analyzer=dftracer cluster=slurm output=csv
Nested parameters:
dfanalyzer output.compact=true cluster.n_workers=4
Core Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Path to the I/O trace data for analysis. |
|
list[str] |
|
A list of perspectives to analyze the data from. |
|
bool |
|
Enable debug mode with more verbose output. |
|
bool |
|
Enable verbose information display. |
Analyzer Configuration
DFAnalyzer supports multiple analyzers for different trace formats. You can
select an analyzer using the analyzer=<type> parameter. The default analyzer
is dftracer.
Common Analyzer Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Enable checkpointing of analysis state. |
|
string |
|
Directory for saving checkpoints. |
|
bool |
|
Use approximate time for analysis. |
|
group |
|
Select a preset configuration for the analyzer. |
|
float |
Varies |
Time granularity for analysis (in seconds). Defaults vary by analyzer. |
|
float |
Varies |
Time resolution for the analyzer (in nanoseconds). Defaults vary by analyzer. |
Analyzer Presets (analyzer/preset)
Presets provide pre-configured settings for different analysis scenarios, making it easier to analyze common workloads without manual configuration. They define how raw trace data is structured into layers and what metrics are derived.
Available presets:
posix (default): For general-purpose POSIX I/O workloads.
dlio: For workloads generated by the DLIO benchmark.
You can select a preset like this:
dfanalyzer analyzer/preset=dlio
DFTracer Analyzer (analyzer=dftracer)
For analyzing DFTracer trace files. This is the default analyzer.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
float |
1 |
Time granularity for DFTracer (in seconds). |
|
float |
1e6 |
Time resolution for DFTracer (in nanoseconds). |
Darshan Analyzer (analyzer=darshan)
For analyzing Darshan DXT trace files.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
float |
1 |
Time granularity for Darshan (in seconds). |
|
float |
1e3 |
Time resolution for Darshan (in nanoseconds). |
Recorder Analyzer (analyzer=recorder)
For analyzing Recorder trace files.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
float |
1 |
Time granularity for Recorder (in seconds). |
|
float |
1e7 |
Time resolution for Recorder (in nanoseconds). |
Output Configuration
Control how analysis results are presented and stored. You can select an output
format using output=<type>.
Common Output Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Use compact output format. |
|
string |
“” |
Custom name for the output. |
|
bool |
|
Only show output on the root process. |
Console Output (output=console)
Prints the analysis summary directly to the console. This is the default output.
CSV Output (output=csv)
Saves the analysis results to a set of CSV files in the output directory.
SQLite Output (output=sqlite)
Saves the analysis results to a SQLite database file.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
“” |
Path to the SQLite database file. If empty, a new file is created. |
Cluster Configuration
Configure settings for running DFAnalyzer in a distributed environment using Dask.
Select a cluster type with cluster=<type>.
Local Cluster (cluster=local)
The default cluster configuration, which runs analysis on the local machine.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
int |
null |
Number of Dask workers. Defaults to the number of CPU cores. |
|
int |
null |
Memory limit per worker (e.g., “8GB”). |
Slurm Cluster (cluster=slurm)
For environments using the Slurm workload manager.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
int |
1 |
Number of Slurm jobs (nodes) to request. |
|
int |
16 |
Cores per job. |
|
string |
null |
Memory per job (e.g., “24GB”). |
|
list[str] |
[] |
Additional directives for the Slurm job script (e.g., |
LSF Cluster (cluster=lsf)
For environments using the LSF workload manager.