.. _tools: Tools ===== This page provides an overview of the supplementary tools distributed with DFAnalyzer. `dfanalyzer-recorder2parquet` ----------------------------- The ``dfanalyzer-recorder2parquet`` tool is a command-line utility designed to convert I/O trace files generated by the Recorder tracing tool into the Apache Parquet format. This conversion is beneficial for efficient storage and subsequent analysis, as Parquet is a columnar storage format optimized for analytical workloads. Functionality ~~~~~~~~~~~~~ - **Input:** Takes raw trace files generated by the Recorder tool. These files typically contain detailed records of I/O operations performed by an application. - **Processing:** - Parses individual trace records, extracting information such as function calls (e.g., ``open``, ``read``, ``write``, POSIX I/O, MPI I/O calls), timestamps, file identifiers, process/rank information, and data transfer sizes. - Categorizes I/O operations (e.g., read, write, metadata). - Extracts metadata from the input trace file paths, such as hostname, application name, and process ID. - **Output:** Generates Parquet files containing the structured I/O trace data. The schema of the Parquet files includes the following fields: .. list-table:: :widths: 20 20 60 :header-rows: 1 * - Field Name - Data Type - Description * - ``index`` - Int64 - Record index * - ``level`` - Int32 - Call stack level (if available) * - ``tstart`` - Float32 - Start timestamp * - ``tmid`` - Int64 - Timestamp midpoint * - ``tend`` - Float32 - End timestamp * - ``duration`` - Float32 - Duration of the operation * - ``hostname`` - UTF8 String - Hostname where the operation occurred * - ``app`` - UTF8 String - Application name * - ``rank`` - Int32 - MPI rank * - ``proc_name`` - UTF8 String - Process name * - ``proc_id`` - Int64 - Unique process identifier * - ``thread_id`` - Int32 - Thread identifier * - ``cat`` - Int32 - Operation category * - ``io_cat`` - Int32 - I/O category (Read, Write, Metadata) * - ``func_id`` - UTF8 String - Function name/identifier * - ``acc_pat`` - Int32 - Access pattern (e.g., sequential, random) * - ``file_id`` - Int64 - Unique file identifier * - ``file_name`` - UTF8 String - Name of the file involved in the operation * - ``size`` - Int64 - Size of the I/O operation (bytes) * - ``bandwidth`` - Float32 - Calculated bandwidth for the operation Usage ~~~~~ The ``dfanalyzer-recorder2parquet`` tool is typically built as part of the DFAnalyzer project, specifically within the ``recorder`` subproject. Its direct usage involves invoking the compiled executable with the path to the Recorder trace files. .. code-block:: bash mpirun -n 8 dfanalyzer-recorder2parquet The tool processes the traces from the specified ````. It outputs one or more ``.parquet`` files into a subdirectory named ``_parquet``, which is automatically created within the ````. These resulting Parquet files can then be used as input for the DFAnalyzer ``recorder`` analyzer.