Running benchmarks #812

ernestum · 2023-10-16T15:34:28Z

Description

Add scripts to run the entire benchmark suite.

Testing

Description of how you've tested your changes.

qxcv

Added some requests for clarification and small changes.

src/imitation/util/sacred_file_parsing.py

benchmarking/compute_probability_of_improvement.py

ernestum · 2023-10-17T18:34:55Z

To give you an idea of how the output CSV and Markdown look:
summary.csv
summary.md

…slurm.

…he probability of improvement.

…ngface dataset.

…ipt.

…mprovement.

… new algorithm to the benchmark runs.

…ade.

…e step properly fails when the installation of one of the packages fails.

codecov · 2023-10-23T08:28:50Z

Codecov Report

Merging #812 (dcde3a9) into master (d833d9e) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #812      +/-   ##
==========================================
+ Coverage   95.63%   95.67%   +0.03%     
==========================================
  Files         100      102       +2     
  Lines        9582     9654      +72     
==========================================
+ Hits         9164     9236      +72     
  Misses        418      418

Files	Coverage Δ
src/imitation/data/huggingface_utils.py	`98.00% <100.00%> (ø)`
src/imitation/util/sacred_file_parsing.py	`100.00% <100.00%> (ø)`
tests/util/test_sacred_file_parsing.py	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

taufeeque9

LGTM. I really like the benchmarking readme! It seems super helpful.

benchmarking/compute_probability_of_improvement.py

taufeeque9 · 2023-10-24T12:04:36Z

benchmarking/run_all_benchmarks.sh

@@ -0,0 +1,201 @@
+#!/bin/bash
+python -m imitation.scripts.train_imitation bc with bc_seals_ant seed=1


Can replace the whole file with the below code. Even if we want to keep the file with separate lines for parallelization purposes, it would be nice to add the below code in a separate file. The full file can be generated with the below code by adding an echo in front of python. This will make it easier for future maintenance when we add other envs and algorithms.

#!/bin/bash seeds=(1 2 3 4 5 6 7 8 9 10) envs=( "seals_ant" "seals_half_cheetah" "seals_hopper" "seals_swimmer" "seals_walker" ) script_algos=( "imitation" "bc" "imitation" "dagger" "adversarial" "airl" "adversarial" "gail" ) for env in "${envs[@]}"; do for ((i=0; i<${#script_algos[@]}; i+=2)); do script=${script_algos[$i]} algo=${script_algos[$((i+1))]} for seed in "${seeds[@]}"; do python -m imitation.scripts.train_${script} ${algo} with ${algo}_${env} seed=${seed} done done done

I know that this can be easily be generated. I used:

envs = [ "seals_ant", "seals_half_cheetah", "seals_hopper", "seals_swimmer", "seals_walker", ] for env in envs: for algo in ["bc", "dagger"]: for seed in range(1,11): print(f"python -m imitation.scripts.train_imitation {algo} with {algo}_{env} seed={seed}") for algo in ["airl", "gail"]: for seed in range(1, 11): print(f"python -m imitation.scripts.train_adversarial {algo} with {algo}_{env} seed={seed}")

However, I opted for solution with less moving parts. Whenever we add new algorithms or change something else I can just re-write this script in 3min. As soon as we put the code in the repo we have to maintain it, test it, document it. I think that is a lot of effort just to safe us those 3mins to re-write this on-off script.

I suppose that's true.

Having the run_all_benchmarks.sh file in the repo feels slightly weird. Do you think we can remove the file entirely and just keep the above python/bash script that generates those commands? There doesn't seem any need to have the run_all_benchmarks.sh script at all.

I don't think this matters much either way (GPT4 should be able to switch between these two formats at will), but I have a slight preference for the current approach because I think it makes it easier to see exactly what was executed, and also to run just a subset of the experiments by copy-pasting.

(The copy-pasting is particularly valuable because I usually end up having to manually assign jobs to GPUs or nodes or whatever, depending on what compute infra I'm using.)

taufeeque9 · 2023-10-24T12:09:12Z

benchmarking/run_all_benchmarks_on_slurm.sh

@@ -0,0 +1,21 @@
+#!/bin/bash
+sbatch --job-name=bc_seals_ant run_benchmark_on_slurm.sh train_imitation bc seals_ant


Can replace the whole file here also in a similar way as above:

#!/bin/bash envs=( "seals_ant" "seals_half_cheetah" "seals_hopper" "seals_swimmer" "seals_walker" ) script_algos=( "imitation" "bc" "imitation" "dagger" "adversarial" "airl" "adversarial" "gail" ) for env in "${envs[@]}"; do for ((i=0; i<${#script_algos[@]}; i+=2)); do script=${script_algos[$i]} algo=${script_algos[$((i+1))]} job_name="${algo}_${env}" echo sbatch --job-name="${job_name}" run_benchmark_on_slurm.sh "train_${script}" "${algo}" "${env}" done done

See above comment.

…d algorithms have the same name and explain why.

qxcv

I re-reviewed the diff from my last review. Looks good to merge!

On the debate about including the script that generates the script that runs experiments or just including the script that runs experiments: it seems like we're 2v1 in favor if the current approach, so I'm inclined to just merge as-is.

ernestum · 2023-10-30T19:35:51Z

Great. This are the final results. Raw values as well as CSV and Markdown Summary. This is to be added as a release artifact
benchmark_runs.zip
!

qxcv reviewed Oct 16, 2023

View reviewed changes

ernestum requested a review from qxcv October 20, 2023 19:04

ernestum added 20 commits October 23, 2023 09:24

Add script to run benchmarks on slurm

0a5a2d8

Increase memory and decrease QOS for benchmark runs.

622e4bf

Add bash script to run all benchmarks on slurm.

a6e1f07

Stop adding a second FileObserver to the sacred runs when running on …

d4b5fcb

…slurm.

Add bash script to run the entire benchmark without any parallelization.

34454f4

Add scripts to generate benchmark summary information and computing t…

9238562

…he probability of improvement.

Speed up the numpy transform when loading a trajectories from a huggi…

d164403

…ngface dataset.

Add missing shebang to bash script and rename it for consistency.

446974d

Add asserts and explanation for the sample matrix.

6293c8b

Remove unused imports.

7ebe15d

Cleaner set intersections.

900dd02

Cleaner check if some envs have been ignored.

f9997a5

Formatting fixes.

e66e828

Add script to export sacred runs to csv file.

91dd263

Add mean/std/ICM and confidence intervals to the markdown summary scr…

b3b14ca

…ipt.

Shellcheck fixes.

97d9585

Introduce a dataclass for the return type of compute_probability_of_i…

1715387

…mprovement.

Improve misleading function name and fix documentation.

8141be8

Explain how to run the entire benchmarking suite and how to compare a…

e7380bf

… new algorithm to the benchmark runs.

Fix line length in compute_probability_of_improvement.py

c6a5910

ernestum force-pushed the running_benchmarks branch from 15d4026 to c6a5910 Compare October 23, 2023 07:25

ernestum added 4 commits October 23, 2023 09:56

Whitespace and formatting fixes

e42c3b0

Switch from coco's deprecated --side-by-side option to --allow-downgr…

60fbbf7

…ade.

Split up the python and openssl/ffmpeg installation to ensure that th…

6aa9663

…e step properly fails when the installation of one of the packages fails.

Fix typo.

70d7699

Add tests for sacred file parsing.

41ec475

taufeeque9 reviewed Oct 24, 2023

View reviewed changes

ernestum added 4 commits October 24, 2023 17:48

Rename make_sample_matrix_... to make_score_matrix_...

4111c60

Add link to the rliable library to the benchmarking README.

ecc03bf

Add no-cover pragma for warnings about incomplete runs.

27510b5

Simplify the conditional printing of run directories when the compare…

dcde3a9

…d algorithms have the same name and explain why.

qxcv approved these changes Oct 27, 2023

View reviewed changes

ernestum merged commit de589d4 into master Oct 30, 2023
15 checks passed

ernestum deleted the running_benchmarks branch October 30, 2023 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running benchmarks #812

Running benchmarks #812

ernestum commented Oct 16, 2023

qxcv left a comment

ernestum commented Oct 17, 2023

codecov bot commented Oct 23, 2023 •

edited

Loading

taufeeque9 left a comment

taufeeque9 Oct 24, 2023

ernestum Oct 24, 2023

taufeeque9 Oct 25, 2023

qxcv Oct 27, 2023 •

edited

Loading

taufeeque9 Oct 24, 2023

ernestum Oct 24, 2023

qxcv left a comment

ernestum commented Oct 30, 2023

		@@ -0,0 +1,201 @@
		#!/bin/bash
		python -m imitation.scripts.train_imitation bc with bc_seals_ant seed=1

		@@ -0,0 +1,21 @@
		#!/bin/bash
		sbatch --job-name=bc_seals_ant run_benchmark_on_slurm.sh train_imitation bc seals_ant

Running benchmarks #812

Running benchmarks #812

Conversation

ernestum commented Oct 16, 2023

Description

Testing

qxcv left a comment

Choose a reason for hiding this comment

ernestum commented Oct 17, 2023

codecov bot commented Oct 23, 2023 • edited Loading

Codecov Report

taufeeque9 left a comment

Choose a reason for hiding this comment

taufeeque9 Oct 24, 2023

Choose a reason for hiding this comment

ernestum Oct 24, 2023

Choose a reason for hiding this comment

taufeeque9 Oct 25, 2023

Choose a reason for hiding this comment

qxcv Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

taufeeque9 Oct 24, 2023

Choose a reason for hiding this comment

ernestum Oct 24, 2023

Choose a reason for hiding this comment

qxcv left a comment

Choose a reason for hiding this comment

ernestum commented Oct 30, 2023

codecov bot commented Oct 23, 2023 •

edited

Loading

qxcv Oct 27, 2023 •

edited

Loading