ALCF Polaris

This section describes how to run DFAnalyzer on ALCF Polaris.


Make sure you already did the necessary steps to build the dfanalyzer here.

Warning

All the steps below should be run in the Polaris login node!

Logging in to Polaris Login Node

ssh <username>@polaris.alcf.anl.gov # and type password from MobilePass/other auth

Note

We recommend you to use VSCode to connect to the Polaris login node as it supports opening Jupyter Notebook and do port forwarding natively

Initializing Dask Configurations

cd <dftracer>/dfanalyzer/dask/conf
./install_dask_env.sh

Note

This will create new directory $HOME/.dftracer/ with files: $HOME/.dftracer/configuration.sh and $HOME/.dftracer/configuration.yaml

Changing $HOME/.dftracer/configuration.yaml

cd $HOME/.dftracer
<EDITOR> configuration.yaml

By default, $HOME/.dftracer/configuration.yaml will contain this entry

app: /usr/WS2/haridev/dftracer
env: ${DFTRACER_APP}/venv

Please modify the app your cloned <dftracer> directory and env to Python virtual environment that you used to install dfanalyzer code here.

Specifically, for ALCF Polaris, we should add one more entry to $HOME/.dftracer/configuration.yaml

account: <your project allocation account>

Note

This account will be used to reserve compute node for Dask distributed workers

Changing polaris.yaml config

cd <dftracer>/dfanalyzer/dask/conf
<EDITOR> polaris.yaml

Note

Please change the polaris.yaml configuration accordingly based on your needs. For example, if you need more nodes, you can change num_nodes under job key or maybe change wall_time_min, etc. For more information regarding ALCF Polaris queue, please look here at Running Jobs on Polaris

Executing scheduler

cd <dftracer>/dfanalyzer/dask/scripts
./start_dask_distributed.sh

Note

Wait for several seconds because this script will try to reserve the compute nodes for you using PBS Job Scheduler

Warning

If you got error with message “port” is used, you may try changing the port in <dftracer/dfanalyzer/dask/conf/polaris.yaml or maybe checking whether other scheduler is running using ps -ef | grep dask to get the process id (PID) and kill the process using kill -9 <pid>

If it runs successfully, you should message below

Running Dask Scheduler on ALCF Polaris

Note

Please check the file <dftracer>/dfanalyzer/dask/logs/worker_<jobid>.log in case there are some problems when running the workers on compute node

Forwarding the Port

We recommend you running notebook inside VSCode because it supports port forwarding natively. In the VSCode, navigate to the bottom bar (where the terminal is). Now, click on the PORTS tab as you can see in below screenshot

VSCode Ports Tab

Then, click Add Port below

VSCode Add Port Button

to add new port and type 8787 since that port is used as Dask monitoring webpage. If you type it correctly, you should show the port is added as new entry

VSCode Added Port

Now, try connecting to http://localhost:8787 and, voila, you will see the Dask scheduler monitoring!

Dask Scheduler Monitoring

Opening Notebook File

In your VSCode, navigate to

<dftracer>/examples/dfanalyzer/dfanalyzer-distributed.ipynb

And just run each cells as usual.

Note

Please use this as the starting point to analyze your traces. Feel free to copy and adjust it if needed!

Stopping Dask Distributed Workers

cd <dftracer>/dfanalyzer/dask/scripts
./stop_dask_distributed.sh

Note

Wait for several seconds because this script will try to kill the workers and deallocate the compute nodes

Tips and Tricks

  1. Add additional scripts to be executed in compute node

    Sometimes we need to execute scripts before executing worker, e.g. setup additional environment variables such as adding LD_LIBRARY_PATH or other variables. For this purpose, dftracer supports this by editing $HOME/.dftracer/configuration.sh.

    <EDITOR> $HOME/.dftracer/configuration.sh
    #
    # add new line at the end of the file
    # e.g.
    # export LD_LIBRARY_PATH=/opt/cray/libfabric/1.15.2.0/lib64:${LD_LIBRARY_PATH}