You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 5, 2024. It is now read-only.
We have a ray.jinja similar to hello.jinja that starts a ray cluster remotely. We need to test it out on Della. Once the ray cluster has started, we should be able to run remote ray commands like these:
ray.init("ray://localhost:10001", runtime_env={"py_modules": [mymodule]})
# At this point 'mymodule' is available on the remote node
from mymodule.api import hello_ray, hello_gpu
future = hello_ray.remote()
result = ray.get(future)
print(result)
Note that mymodule is local code that only sits on your machine (and is not part of the wbi module).
The ray:///.. part should point to the head node of the ray cluster. Normally this would be the head node of the ray process (not the head node on Della). To find out which compute node was assigned the head node of Ray, look at the output of the slurm job (the ray template outputs all this information to stdout). Let's say its della-l07g3. Then you can set up local port forwarding like so:
ssh -J della -N -L "10001:localhost:10001" "della-l07g3"
i.e. forward port 10001 on localhost to della-l07g3 throught the jump serverdella, The address to ray.init can then simply be ray://localhost:10001.
The current template activates an environment:
conda activate ray
For this to work for all users, the environment has to be placed at a place that is readable by all persons in the group, and we can do something like conda activate /tigress/LEIFER/path/to/conda/env.
The text was updated successfully, but these errors were encountered:
EDIT: Handle after tackling all other issues.
We have a
ray.jinja
similar tohello.jinja
that starts a ray cluster remotely. We need to test it out on Della. Once the ray cluster has started, we should be able to run remote ray commands like these:Note that
mymodule
is local code that only sits on your machine (and is not part of thewbi
module).The
ray:///..
part should point to the head node of the ray cluster. Normally this would be the head node of the ray process (not the head node on Della). To find out which compute node was assigned the head node of Ray, look at the output of the slurm job (the ray template outputs all this information tostdout
). Let's say itsdella-l07g3
. Then you can set up local port forwarding like so:i.e. forward port 10001 on
localhost
todella-l07g3
throught the jump serverdella
, The address toray.init
can then simply beray://localhost:10001
.The current template activates an environment:
For this to work for all users, the environment has to be placed at a place that is readable by all persons in the group, and we can do something like
conda activate /tigress/LEIFER/path/to/conda/env
.The text was updated successfully, but these errors were encountered: