Skip to content

Commit

Permalink
Merge pull request #64 from neuroinformatics-unit/update-ssh-guide
Browse files Browse the repository at this point in the history
Update ssh setup guide
  • Loading branch information
lauraporta authored Jul 16, 2024
2 parents 559d8f9 + b1e0159 commit 8597c6f
Show file tree
Hide file tree
Showing 5 changed files with 101 additions and 27 deletions.
Binary file added docs/source/_static/ssh_flowchart_full.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
36 changes: 36 additions & 0 deletions docs/source/_static/ssh_flowchart_source.mmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
flowchart LR
classDef safe stroke:#03A062,stroke-width:3px;
classDef unsafe stroke:#d95f02,stroke-width:3px;
classDef emphasis color:#03A062,stroke:#03A062,stroke-width:3px;

unmanaged("💻\n Unmanaged\n Computer") -->|ssh| bastion(("Bastion Node\n ssh.swc.ucl.ac.uk"))

subgraph trusted["SWC Network Domain"]

bastion --> |ssh| gateway(("Gateway Node\n hpc-gw1"))
managed_win("💻\n Managed\n Windows\n Desktop") ---> |ssh| gateway
managed_linux("💻\n Managed\n Linux\n Desktop") ---> |ssh| gateway
bastion ----> |srun\n sbatch| slurm{"SLURM 🚦"}
gateway --> |srun\n sbatch| slurm
managed_linux --> |srun\n sbatch| slurm

subgraph compute["Compute Nodes"]
node1(("Node 1"))
node2(("Node 2"))
node3(("Node 3"))
end

slurm --> node1
slurm --> node2
slurm --> node3

end

class bastion emphasis
class gateway emphasis
class unmanaged unsafe
class managed_win safe
class managed_linux safe
class node1 safe
class node2 safe
class node3 safe
Binary file added docs/source/_static/ssh_flowchart_unmanaged.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/source/_static/swc_hpc_access_flowchart.png
Binary file not shown.
92 changes: 65 additions & 27 deletions docs/source/programming/SSH-SWC-cluster.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
(ssh-cluster-target)=
# Set up SSH for the SWC HPC cluster

This guide explains how to connect to the SWC's HPC cluster via SSH.
This guide explains how to connect to the SWC's HPC cluster via SSH from
any personal computer.

If you have access to a desktop managed by the SWC's IT team
the connection is much more straightforward than described here
(see the [note on managed desktops](ssh-managed-target)).

```{include} ../_static/swc-wiki-warning.md
```
Expand All @@ -12,10 +17,11 @@ This guide explains how to connect to the SWC's HPC cluster via SSH.
## Abbreviations
| Acronym | Meaning |
| ----------------------------------------------------------------------- | -------------------------------------------- |
| [SSH](https://en.wikipedia.org/wiki/Secure_Shell) | Secure (Socket) Shell protocol |
| [SWC](https://www.sainsburywellcome.org/web/) | Sainsbury Wellcome Centre |
| [HPC](https://en.wikipedia.org/wiki/High-performance_computing) | High Performance Computing |
| [IT](https://en.wikipedia.org/wiki/Information_technology) | Information Technology |
| [SLURM](https://slurm.schedmd.com/) | Simple Linux Utility for Resource Management |
| [SSH](https://en.wikipedia.org/wiki/Secure_Shell) | Secure (Socket) Shell protocol |
| [IDE](https://en.wikipedia.org/wiki/Integrated_development_environment) | Integrated Development Environment |
| [GUI](https://en.wikipedia.org/wiki/Graphical_user_interface) | Graphical User Interface |

Expand Down Expand Up @@ -70,19 +76,17 @@ You have now successfully logged into the cluster 🎉. You may stop reading her
If you want to learn more about why we had to SSH twice, read the [next section](#why-do-we-ssh-twice).

If you want to make you life easier, you can set yourself up with an [SSH config file](#ssh-config-file)
and some [SSH keys](#ssh-keys). Trust us, it's worth the effort. For example, one
benefit is that you will be able to use [Visual Studio Code](https://code.visualstudio.com/)
on your PC/laptop to edit files on the cluster (see the [last section](#remote-development)).
and some [SSH keys](#ssh-keys).
:::

## Why do we SSH twice?
We first need to distinguish the different types of nodes on the SWC HPC system:

- the *bastion* node (or "jump host") - `ssh.swc.ucl.ac.uk`. This serves as a single entry point to the cluster from external networks. By funneling all external SSH connections through this node, it's easier to monitor, log, and control access, reducing the attack surface. The *bastion* node has very little processing power. It can be used to submit and monitor SLURM jobs, but it shouldn't be used for anything else.
- the *bastion* node (or *login node*) - `ssh.swc.ucl.ac.uk`. This serves as a single entry point to the cluster from external networks. By funneling all external SSH connections through this node, it's easier to monitor, log, and control access, reducing the attack surface. The *bastion* node has very little processing power. It can be used to submit and monitor SLURM jobs, but it shouldn't be used for anything else.
- the *gateway* node - `hpc-gw1`. This is a more powerful machine and can be used for light processing, such as editing your scripts, creating and copying files etc. However don't use it for anything computationally intensive, since this node's resources are shared across all users.
- the *compute* nodes - `enc1-node10`, `gpu-sr670-21`, etc. These are the machinces that actually run the jobs we submit, either interactively via `srun` or via batch scripts submitted with `sbatch`.

![](../_static/swc_hpc_access_flowchart.png)
![](../_static/ssh_flowchart_unmanaged.png)

Your home directory, as well as the locations where filesystems like `ceph` are mounted, are shared across all of the nodes.

Expand All @@ -92,11 +96,61 @@ Similarly, if you are on the *gateway* node, typing `logout` once will only get

The *compute* nodes should only be accessed via the SLURM `srun` or `sbatch` commands. This can be done from either the *bastion* or the *gateway* nodes. If you are running an interactive job on one of the *compute* nodes, you can terminate it by typing `exit`. This will return you to the node from which you entered.

:::{dropdown} Be mindful of node usage
:color: warning
:icon: alert

Avoid running heavy computations on the *bastion* or *gateway* nodes, as
they are meant for light tasks like text editing or job submissions to SLURM.

For quick tasks that may burden these nodes,
request an interactive session on a *compute* node using the `srun` command.
Here's an example for creating a new conda environment:

```{code-block} console
$ srun -p fast -n 4 --mem 8G --pty bash -i
$ module load miniconda
$ conda create -n myenv python=3.10
```

The first command requests 4 cores and 8GB of memory on a node of the `fast`
partition, meant for jobs up to 3 hours long. The `--pty bash -i` part specifies
an interactive bash shell. The following two commands are run in this shell,
on the assigned *compute* node.

Type `exit` to leave the interactive session when finished.
Avoid keeping sessions open when not in use.
:::

(ssh-managed-target)=
## Note on managed desktops

The SWC's IT team offers managed desktop computers equipped with either
a Windows or a Linux image. These machines are already part of the SWC's
trusted network domain, meaning you can access the HPC cluster without
having to go through the *bastion* node.

- If you are using a [managed Windows desktop](https://wiki.ucl.ac.uk/display/SSC/SWC+Desktops),
you can SSH directly into the *gateway* node with `ssh hpc-gw1` from the
Windows `cmd` or PowerShell.
You may use that node to prepare your scripts and submit SLURM jobs.
- If you are using a [managed Linux desktop](https://wiki.ucl.ac.uk/display/SSC/Managed+Linux+Desktop),
you can even bypass the *gateway* node. In fact, you may directly submit SLURM jobs
from your terminal, without having to SSH at all. That's because managed Linux desktops
use the same platform as the HPC nodes
and are already equipped with the SLURM job scheduler.

A modified version of the flowchart found above, including managed desktops:

![](../_static/ssh_flowchart_full.png)


## SSH config file
If you find yourself typing the above commands over and over again, you can make
your life easier by editing the SSH config file.
This is a text file that lives in your home directory and contains a list of aliases
for SSH connections.
If you are frequently accessing the cluster from an unmanaged machine,
you may find yourself typing the same SSH commands over and over again.
You can make your life easier by editing the SSH config file.
This is a text file that lives in your home directory and contains
a list of aliases for SSH connections.

On your local PC/Laptop, navigate to the `.ssh` folder in your user's home `~` directory:
```{code-block} console
Expand Down Expand Up @@ -256,19 +310,3 @@ In case you want to SSH into the *bastion* node, you can do so by typing:
```{code-block} console
$ ssh swc-bastion
```

## Remote development
One benefit of setting your SSH config and SSH keys is that you can now easily use
[Visual Studio Code](https://code.visualstudio.com/) to edit files on remote machines.
This is especially useful for editing scripts on the *gateway* node that you want to
run on the *compute* nodes of the cluster.

To do this, you need to install the [Remote - SSH](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-ssh) extension in VS Code. Then, when you
click on the "Open a Remote Window" button in the bottom left corner of the VS Code window,
you will see a list of the SSH hosts you have configured in your `~/.ssh/config` file.
You can then select the host you want to connect to - e.g. `swc-gateway` - and VS Code
will open a new window with a terminal connected to that host. You can then use the
VS Code GUI to navigate the file system and edit files on the remote machine.

Other IDEs like [PyCharm](https://www.jetbrains.com/pycharm/) also offer
[similar functionality](https://www.jetbrains.com/help/pycharm/remote-development-overview.html).

0 comments on commit 8597c6f

Please sign in to comment.