.. _configuration: Configuration ============= This reference guide documents all the configuration parameters available in DFAnalyzer. These parameters control the behavior of analyzers, outputs, and cluster configurations. Using Configuration Parameters ------------------------------ DFAnalyzer uses `Hydra `_ for configuration management, which provides a flexible way to organize and override parameters. You can specify parameters in several ways: 1. **Command line overrides**: .. code-block:: bash dfanalyzer trace_path=path/to/traces analyzer/preset=dlio 2. **Group selection**: .. code-block:: bash dfanalyzer analyzer=dftracer cluster=slurm output=csv 3. **Nested parameters**: .. code-block:: bash dfanalyzer output.compact=true cluster.n_workers=4 Core Parameters --------------- .. list-table:: :widths: 25 15 20 40 :header-rows: 1 * - Parameter - Type - Default - Description * - ``trace_path`` - string - Required - Path to the I/O trace data for analysis. * - ``view_types`` - list[str] - ``["file_name", "proc_name", "time_range"]`` - A list of perspectives to analyze the data from. * - ``debug`` - bool - ``false`` - Enable debug mode with more verbose output. * - ``verbose`` - bool - ``false`` - Enable verbose information display. Analyzer Configuration ---------------------- DFAnalyzer supports multiple analyzers for different trace formats. You can select an analyzer using the ``analyzer=`` parameter. The default analyzer is **dftracer**. Common Analyzer Parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 25 15 30 30 :header-rows: 1 * - Parameter - Type - Default - Description * - ``analyzer.checkpoint`` - bool - ``true`` - Enable checkpointing of analysis state. * - ``analyzer.checkpoint_dir`` - string - ``${hydra:runtime.output_dir}/checkpoints`` - Directory for saving checkpoints. * - ``analyzer.time_approximate`` - bool - ``true`` - Use approximate time for analysis. * - ``analyzer.preset`` - group - ``posix`` - Select a preset configuration for the analyzer. * - ``analyzer.time_granularity`` - float - Varies - Time granularity for analysis (in seconds). Defaults vary by analyzer. * - ``analyzer.time_resolution`` - float - Varies - Time resolution for the analyzer (in nanoseconds). Defaults vary by analyzer. Analyzer Presets (``analyzer/preset``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Presets provide pre-configured settings for different analysis scenarios, making it easier to analyze common workloads without manual configuration. They define how raw trace data is structured into layers and what metrics are derived. Available presets: - **posix** (default): For general-purpose POSIX I/O workloads. - **dlio**: For workloads generated by the `DLIO benchmark `_. You can select a preset like this: .. code-block:: bash dfanalyzer analyzer/preset=dlio DFTracer Analyzer (``analyzer=dftracer``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For analyzing DFTracer trace files. This is the **default** analyzer. .. list-table:: :widths: 25 15 15 45 :header-rows: 1 * - Parameter - Type - Default - Description * - ``analyzer.time_granularity`` - float - 1 - Time granularity for DFTracer (in seconds). * - ``analyzer.time_resolution`` - float - 1e6 - Time resolution for DFTracer (in nanoseconds). Darshan Analyzer (``analyzer=darshan``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For analyzing Darshan DXT trace files. .. list-table:: :widths: 25 15 15 45 :header-rows: 1 * - Parameter - Type - Default - Description * - ``analyzer.time_granularity`` - float - 1 - Time granularity for Darshan (in seconds). * - ``analyzer.time_resolution`` - float - 1e3 - Time resolution for Darshan (in nanoseconds). Recorder Analyzer (``analyzer=recorder``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For analyzing Recorder trace files. .. list-table:: :widths: 25 15 15 45 :header-rows: 1 * - Parameter - Type - Default - Description * - ``analyzer.time_granularity`` - float - 1 - Time granularity for Recorder (in seconds). * - ``analyzer.time_resolution`` - float - 1e7 - Time resolution for Recorder (in nanoseconds). Output Configuration -------------------- Control how analysis results are presented and stored. You can select an output format using ``output=``. Common Output Parameters ~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 25 15 15 45 :header-rows: 1 * - Parameter - Type - Default - Description * - ``output.compact`` - bool - ``false`` - Use compact output format. * - ``output.name`` - string - "" - Custom name for the output. * - ``output.root_only`` - bool - ``true`` - Only show output on the root process. Console Output (``output=console``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Prints the analysis summary directly to the console. This is the **default** output. CSV Output (``output=csv``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Saves the analysis results to a set of CSV files in the output directory. SQLite Output (``output=sqlite``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Saves the analysis results to a SQLite database file. .. list-table:: :widths: 25 15 30 30 :header-rows: 1 * - Parameter - Type - Default - Description * - ``output.run_db_path`` - string - "" - Path to the SQLite database file. If empty, a new file is created. Cluster Configuration --------------------- Configure settings for running DFAnalyzer in a distributed environment using Dask. Select a cluster type with ``cluster=``. Local Cluster (``cluster=local``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The **default** cluster configuration, which runs analysis on the local machine. .. list-table:: :widths: 25 15 15 45 :header-rows: 1 * - Parameter - Type - Default - Description * - ``cluster.n_workers`` - int - null - Number of Dask workers. Defaults to the number of CPU cores. * - ``cluster.memory_limit`` - int - null - Memory limit per worker (e.g., "8GB"). Slurm Cluster (``cluster=slurm``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For environments using the Slurm workload manager. .. list-table:: :widths: 25 15 15 45 :header-rows: 1 * - Parameter - Type - Default - Description * - ``cluster.processes`` - int - 1 - Number of Slurm jobs (nodes) to request. * - ``cluster.cores`` - int - 16 - Cores per job. * - ``cluster.memory`` - string - null - Memory per job (e.g., "24GB"). * - ``cluster.job_extra_directives`` - list[str] - [] - Additional directives for the Slurm job script (e.g., ``["--account=", "--partition="]``). LSF Cluster (``cluster=lsf``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For environments using the LSF workload manager. PBS Cluster (``cluster=pbs``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~