-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running benchmarks #812
Running benchmarks #812
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some requests for clarification and small changes.
To give you an idea of how the output CSV and Markdown look: |
…he probability of improvement.
… new algorithm to the benchmark runs.
15d4026
to
c6a5910
Compare
…e step properly fails when the installation of one of the packages fails.
Codecov Report
@@ Coverage Diff @@
## master #812 +/- ##
==========================================
+ Coverage 95.63% 95.67% +0.03%
==========================================
Files 100 102 +2
Lines 9582 9654 +72
==========================================
+ Hits 9164 9236 +72
Misses 418 418
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I really like the benchmarking readme! It seems super helpful.
@@ -0,0 +1,201 @@ | |||
#!/bin/bash | |||
python -m imitation.scripts.train_imitation bc with bc_seals_ant seed=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can replace the whole file with the below code. Even if we want to keep the file with separate lines for parallelization purposes, it would be nice to add the below code in a separate file. The full file can be generated with the below code by adding an echo
in front of python. This will make it easier for future maintenance when we add other envs and algorithms.
#!/bin/bash
seeds=(1 2 3 4 5 6 7 8 9 10)
envs=(
"seals_ant"
"seals_half_cheetah"
"seals_hopper"
"seals_swimmer"
"seals_walker"
)
script_algos=(
"imitation" "bc"
"imitation" "dagger"
"adversarial" "airl"
"adversarial" "gail"
)
for env in "${envs[@]}"; do
for ((i=0; i<${#script_algos[@]}; i+=2)); do
script=${script_algos[$i]}
algo=${script_algos[$((i+1))]}
for seed in "${seeds[@]}"; do
python -m imitation.scripts.train_${script} ${algo} with ${algo}_${env} seed=${seed}
done
done
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this can be easily be generated. I used:
envs = [
"seals_ant",
"seals_half_cheetah",
"seals_hopper",
"seals_swimmer",
"seals_walker",
]
for env in envs:
for algo in ["bc", "dagger"]:
for seed in range(1,11):
print(f"python -m imitation.scripts.train_imitation {algo} with {algo}_{env} seed={seed}")
for algo in ["airl", "gail"]:
for seed in range(1, 11):
print(f"python -m imitation.scripts.train_adversarial {algo} with {algo}_{env} seed={seed}")
However, I opted for solution with less moving parts. Whenever we add new algorithms or change something else I can just re-write this script in 3min. As soon as we put the code in the repo we have to maintain it, test it, document it. I think that is a lot of effort just to safe us those 3mins to re-write this on-off script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that's true.
Having the run_all_benchmarks.sh
file in the repo feels slightly weird. Do you think we can remove the file entirely and just keep the above python/bash script that generates those commands? There doesn't seem any need to have the run_all_benchmarks.sh
script at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this matters much either way (GPT4 should be able to switch between these two formats at will), but I have a slight preference for the current approach because I think it makes it easier to see exactly what was executed, and also to run just a subset of the experiments by copy-pasting.
(The copy-pasting is particularly valuable because I usually end up having to manually assign jobs to GPUs or nodes or whatever, depending on what compute infra I'm using.)
@@ -0,0 +1,21 @@ | |||
#!/bin/bash | |||
sbatch --job-name=bc_seals_ant run_benchmark_on_slurm.sh train_imitation bc seals_ant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can replace the whole file here also in a similar way as above:
#!/bin/bash
envs=(
"seals_ant"
"seals_half_cheetah"
"seals_hopper"
"seals_swimmer"
"seals_walker"
)
script_algos=(
"imitation" "bc"
"imitation" "dagger"
"adversarial" "airl"
"adversarial" "gail"
)
for env in "${envs[@]}"; do
for ((i=0; i<${#script_algos[@]}; i+=2)); do
script=${script_algos[$i]}
algo=${script_algos[$((i+1))]}
job_name="${algo}_${env}"
echo sbatch --job-name="${job_name}" run_benchmark_on_slurm.sh "train_${script}" "${algo}" "${env}"
done
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment.
…d algorithms have the same name and explain why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-reviewed the diff from my last review. Looks good to merge!
On the debate about including the script that generates the script that runs experiments or just including the script that runs experiments: it seems like we're 2v1 in favor if the current approach, so I'm inclined to just merge as-is.
Great. This are the final results. Raw values as well as CSV and Markdown Summary. This is to be added as a release artifact |
Description
Add scripts to run the entire benchmark suite.
Testing
Description of how you've tested your changes.