Advanced Configuration#
Initially pysqa was only designed to interact with the local queuing systems of an HPC cluster. This functionality has
recently been extended to support remote HPC clusters in addition to local HPC clusters. These two developments, the
support for remote HPC clusters and the support for multiple clusters in pysqa are discussed in the following. Both of
these features are under active development so this part of the interface might change more frequently than the rest.
Remote HPC Configuration#
Remote clusters can be defined in the queue.yaml file by setting the queue_type to REMOTE:
queue_type: REMOTE
queue_primary: remote
ssh_host: hpc-cluster.university.edu
ssh_username: hpcuser
known_hosts: ~/.ssh/known_hosts
ssh_key: ~/.ssh/id_rsa
ssh_remote_config_dir: /u/share/pysqa/resources/queues/
ssh_remote_path: /u/hpcuser/remote/
ssh_local_path: /home/localuser/projects/
ssh_continous_connection: True
ssh_delete_file_on_remote: False
queues:
remote: {cores_max: 100, cores_min: 10, run_time_max: 259200}
In addition to queue_type, queue_primary and queues parameters, this also has the following required keywords:
ssh_hostthe remote HPC login node to connect tossh_usernamethe username on the HPC login nodeknown_hoststhe local file of known hosts which needs to contain thessh_hostdefined above.ssh_keythe local key for the SSH connectionssh_remote_config_dirthepysqaconfiguration directory on the remote HPC clusterssh_remote_paththe remote directory on the HPC cluster to transfer calculations tossh_local_paththe local directory to transfer calculations from
And optional keywords:
ssh_delete_file_on_remotespecify whether files on the remote HPC should be deleted after they are transferred back to the local system - defaults toTruessh_portthe port used for the SSH connection on the remote HPC cluster - defaults to22
A definition of the queues in the local system is required to enable the parameter checks locally. Still it is
sufficient to only store the individual submission script templates only on the remote HPC.
Access to Multiple HPCs#
To support multiple remote HPC clusters additional functionality was added to pysqa.
Namely, a clusters.yaml file can be defined in the configuration directory, which defines multiple queue.yaml files
for different clusters:
cluster_primary: local_slurm
cluster: {
local_slurm: local_slurm_queues.yaml,
remote_slurm: remote_queues.yaml
}
These queue.yaml files can again include all the functionality defined previously, including the configuration for
remote connection using SSH.
Furthermore, the QueueAdapter class was extended with the following two functions:
qa.list_clusters()
To list the available clusters in the configuration and:
qa.switch_cluster(cluster_name)
To switch from one cluster to another, with the cluster_name providing the name of the cluster like local_slurm and
remote_slurm in the configuration above.