Advanced Configuration#
Initially pysqa
was only designed to interact with the local queuing systems of an HPC cluster. This functionality has recently been extended to support remote HPC clusters in addition to local HPC clusters. These two developments, the support for remote HPC clusters and the support for multiple clusters in pysqa
are discussed in the following. Both of these features are under active development so this part of the interface might change more frequently than the rest.
Remote HPC Configuration#
Remote clusters can be defined in the queue.yaml
file by setting the queue_type
to REMOTE
:
queue_type: REMOTE
queue_primary: remote
ssh_host: hpc-cluster.university.edu
ssh_username: hpcuser
known_hosts: ~/.ssh/known_hosts
ssh_key: ~/.ssh/id_rsa
ssh_remote_config_dir: /u/share/pysqa/resources/queues/
ssh_remote_path: /u/hpcuser/remote/
ssh_local_path: /home/localuser/projects/
ssh_continous_connection: True
ssh_delete_file_on_remote: False
queues:
remote: {cores_max: 100, cores_min: 10, run_time_max: 259200}
In addition to queue_type
, queue_primary
and queues
parameters, this also has the following required keywords:
ssh_host
the remote HPC login node to connect tossh_username
the username on the HPC login nodeknown_hosts
the local file of known hosts which needs to contain thessh_host
defined above.ssh_key
the local key for the SSH connectionssh_remote_config_dir
thepysqa
configuration directory on the remote HPC clusterssh_remote_path
the remote directory on the HPC cluster to transfer calculations tossh_local_path
the local directory to transfer calculations from
And optional keywords:
ssh_delete_file_on_remote
specify whether files on the remote HPC should be deleted after they are transferred back to the local system - defaults toTrue
ssh_port
the port used for the SSH connection on the remote HPC cluster - defaults to22
A definition of the queues
in the local system is required to enable the parameter checks locally. Still it is sufficient to only store the individual submission script templates only on the remote HPC.
Access to Multiple HPCs#
To support multiple remote HPC clusters additional functionality was added to pysqa
.
Namely, a clusters.yaml
file can be defined in the configuration directory, which defines multiple queue.yaml
files for different clusters:
cluster_primary: local_slurm
cluster: {
local_slurm: local_slurm_queues.yaml,
remote_slurm: remote_queues.yaml
}
These queue.yaml
files can again include all the functionality defined previously, including the configuration for remote connection using SSH.
Furthermore, the QueueAdapter
class was extended with the following two functions:
qa.list_clusters()
To list the available clusters in the configuration and:
qa.switch_cluster(cluster_name)
To switch from one cluster to another, with the cluster_name
providing the name of the cluster like local_slurm
and remote_slurm
in the configuration above.