.. _configuration:

Configuration
=============

This reference guide documents all the configuration parameters available in
DFAnalyzer. These parameters control the behavior of analyzers, outputs, and
cluster configurations.

Using Configuration Parameters
------------------------------

DFAnalyzer uses `Hydra <https://hydra.cc>`_ for configuration management, which
provides a flexible way to organize and override parameters. You can specify
parameters in several ways:

1. **Command line overrides**:

   .. code-block:: bash

      dfanalyzer trace_path=path/to/traces analyzer/preset=dlio

2. **Group selection**:

   .. code-block:: bash

      dfanalyzer analyzer=dftracer cluster=slurm output=csv

3. **Nested parameters**:

   .. code-block:: bash

      dfanalyzer output.compact=true cluster.n_workers=4

Core Parameters
---------------

.. list-table::
   :widths: 25 15 20 40
   :header-rows: 1

   * - Parameter
     - Type
     - Default
     - Description
   * - ``trace_path``
     - string
     - Required
     - Path to the I/O trace data for analysis.
   * - ``view_types``
     - list[str]
     - ``["file_name", "proc_name", "time_range"]``
     - A list of perspectives to analyze the data from.
   * - ``debug``
     - bool
     - ``false``
     - Enable debug mode with more verbose output.
   * - ``verbose``
     - bool
     - ``false``
     - Enable verbose information display.

Analyzer Configuration
----------------------

DFAnalyzer supports multiple analyzers for different trace formats. You can
select an analyzer using the ``analyzer=<type>`` parameter. The default analyzer
is **dftracer**.

Common Analyzer Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :widths: 25 15 30 30
   :header-rows: 1

   * - Parameter
     - Type
     - Default
     - Description
   * - ``analyzer.checkpoint``
     - bool
     - ``true``
     - Enable checkpointing of analysis state.
   * - ``analyzer.checkpoint_dir``
     - string
     - ``${hydra:runtime.output_dir}/checkpoints``
     - Directory for saving checkpoints.
   * - ``analyzer.time_approximate``
     - bool
     - ``true``
     - Use approximate time for analysis.
   * - ``analyzer.preset``
     - group
     - ``posix``
     - Select a preset configuration for the analyzer.
   * - ``analyzer.time_granularity``
     - float
     - Varies
     - Time granularity for analysis (in seconds). Defaults vary by analyzer.
   * - ``analyzer.time_resolution``
     - float
     - Varies
     - Time resolution for the analyzer (in nanoseconds). Defaults vary by analyzer.

Analyzer Presets (``analyzer/preset``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Presets provide pre-configured settings for different analysis scenarios,
making it easier to analyze common workloads without manual configuration.
They define how raw trace data is structured into layers and what metrics
are derived.

Available presets:

- **posix** (default): For general-purpose POSIX I/O workloads.
- **dlio**: For workloads generated by the `DLIO benchmark <https://github.com/argonne-lcf/dlio_benchmark>`_.

You can select a preset like this:

.. code-block:: bash

   dfanalyzer analyzer/preset=dlio

DFTracer Analyzer (``analyzer=dftracer``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For analyzing DFTracer trace files. This is the **default** analyzer.

.. list-table::
   :widths: 25 15 15 45
   :header-rows: 1

   * - Parameter
     - Type
     - Default
     - Description
   * - ``analyzer.time_granularity``
     - float
     - 1
     - Time granularity for DFTracer (in seconds).
   * - ``analyzer.time_resolution``
     - float
     - 1e6
     - Time resolution for DFTracer (in nanoseconds).

Darshan Analyzer (``analyzer=darshan``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For analyzing Darshan DXT trace files.

.. list-table::
   :widths: 25 15 15 45
   :header-rows: 1

   * - Parameter
     - Type
     - Default
     - Description
   * - ``analyzer.time_granularity``
     - float
     - 1
     - Time granularity for Darshan (in seconds).
   * - ``analyzer.time_resolution``
     - float
     - 1e3
     - Time resolution for Darshan (in nanoseconds).

Recorder Analyzer (``analyzer=recorder``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For analyzing Recorder trace files.

.. list-table::
   :widths: 25 15 15 45
   :header-rows: 1

   * - Parameter
     - Type
     - Default
     - Description
   * - ``analyzer.time_granularity``
     - float
     - 1
     - Time granularity for Recorder (in seconds).
   * - ``analyzer.time_resolution``
     - float
     - 1e7
     - Time resolution for Recorder (in nanoseconds).

Output Configuration
--------------------

Control how analysis results are presented and stored. You can select an output
format using ``output=<type>``.

Common Output Parameters
~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :widths: 25 15 15 45
   :header-rows: 1

   * - Parameter
     - Type
     - Default
     - Description
   * - ``output.compact``
     - bool
     - ``false``
     - Use compact output format.
   * - ``output.name``
     - string
     - ""
     - Custom name for the output.
   * - ``output.root_only``
     - bool
     - ``true``
     - Only show output on the root process.

Console Output (``output=console``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Prints the analysis summary directly to the console. This is the **default** output.

CSV Output (``output=csv``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Saves the analysis results to a set of CSV files in the output directory.

SQLite Output (``output=sqlite``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Saves the analysis results to a SQLite database file.

.. list-table::
   :widths: 25 15 30 30
   :header-rows: 1

   * - Parameter
     - Type
     - Default
     - Description
   * - ``output.run_db_path``
     - string
     - ""
     - Path to the SQLite database file. If empty, a new file is created.

Cluster Configuration
---------------------

Configure settings for running DFAnalyzer in a distributed environment using Dask.
Select a cluster type with ``cluster=<type>``.

Local Cluster (``cluster=local``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The **default** cluster configuration, which runs analysis on the local machine.

.. list-table::
   :widths: 25 15 15 45
   :header-rows: 1

   * - Parameter
     - Type
     - Default
     - Description
   * - ``cluster.n_workers``
     - int
     - null
     - Number of Dask workers. Defaults to the number of CPU cores.
   * - ``cluster.memory_limit``
     - int
     - null
     - Memory limit per worker (e.g., "8GB").

Slurm Cluster (``cluster=slurm``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For environments using the Slurm workload manager.

.. list-table::
   :widths: 25 15 15 45
   :header-rows: 1

   * - Parameter
     - Type
     - Default
     - Description
   * - ``cluster.processes``
     - int
     - 1
     - Number of Slurm jobs (nodes) to request.
   * - ``cluster.cores``
     - int
     - 16
     - Cores per job.
   * - ``cluster.memory``
     - string
     - null
     - Memory per job (e.g., "24GB").
   * - ``cluster.job_extra_directives``
     - list[str]
     - []
     - Additional directives for the Slurm job script (e.g., ``["--account=<name>", "--partition=<name>"]``).

LSF Cluster (``cluster=lsf``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For environments using the LSF workload manager.

PBS Cluster (``cluster=pbs``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~