dvc exp run: replacing output folder instead of writing #10527

ggrrll · 2024-08-16T10:35:12Z

Bug Report Help on parameters tuning

Description

(not sure this is actually a 🐛 I or I am doing something wrong...)

when I run a stage with dvc exp run --downstream [mystage-name] , the output folder is replaced, although I have changed the parameter values, which is contained in the folder name

in my dvc.yaml file, mystage looks like

  mystage:
    cmd: python myscript.py 
    deps:
    - some_depends
    outs:
    - data/runs-optimization/
    params:
    - mystage.alpha

and myscript.py gets the default parameters from a params.yaml

I understand why , if I set a parameter value which I already used for a previous experiment, then I get
Stage 'mystage' is cached - skipping run, checking out outputs

but I still want to analyse the results for all parameters combinations ...
at the moment , I just look over my outputs folders, with sub-folders , one for every parameters combination
in that case I would have to run the dvc exp run ... to reload the result from cache ... (?)

Reproduce

(I cannot share code or data, as it's proprietary )

Expected

new folder, with new parameters values in its name

I have checked that running the stage 'manually' (python mystage.py) does create the new folder , as expected

Environment information

dvc v. 3.53.2
Python 3.9.10
System Version: macOS 14.1.2

`dvc doctor`

DVC version: 3.53.2 (pip)
-------------------------
Platform: Python 3.9.10 on macOS-14.1.2-x86_64-i386-64bit
Subprojects:
        dvc_data = 3.15.2
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.7
Supports:
        http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.6.1, boto3 = 1.34.131)

The text was updated successfully, but these errors were encountered:

dberenbaum · 2024-08-16T13:20:36Z

So you are looking for how you compare different experiment results, correct? With dvc, it is not expected that they live side-by-side in subdirectories. Instead, each is a git commit. Just like you don't make code changes in git by creating a new copy of a file, you don't create a new file or directory for each experiment with dvc. Instead, you can use commands like dvc exp show to compare experiments. Take a look at https://dvc.org/doc/user-guide/experiment-management/comparing-experiments for more details.

ggrrll · 2024-08-16T14:49:29Z

@dberenbaum thanks for your answer

yes ( and I am trying to follow https://dvc.ai/blog/hyperparam-tuning )

so, in this way , how can I plot summary statistics (of my metrics , over parameters space) ?
I can see that I can print the metric value, by adding an evaluation step, like done in https:/iterative/example-get-started/blob/main/dvc.yaml, but then what can you do in order to visualize them all, in summary plots? (given that the experiments are cached).
As far as I can see, one has to add another stage, maybe accessing the table with the dvc python api , like shown in https://dvc.org/doc/user-guide/experiment-management/comparing-experiments#other-ways-to-access-the-experiments-table (?)

Then, one could wonder if it's worth doing it, while instead a dedicated stage for parameter tuning could be added, like shown in https://campus.datacamp.com/courses/cicd-for-machine-learning/comparing-training-runs-and-hyperparameter-hp-tuning?ex=5

what are our thoughts / suggestions?

thanks

shcheklein · 2024-08-19T01:44:59Z

As a workaround / hack you can try to set persist: true for this output. Please read more docs here. It might help to save all the results in a single directory. I don't think it's possible to use it though if you run multiple experiments using a queue in parallel.

In the DVC VS Code extension you could plot multiple experiments with "custom plots". Check the custom plots:

Screen.Recording.2024-08-18.at.6.40.03.PM.mov

but then what can you do in order to visualize them all, in summary plots? (given that the experiments are cached).

If you need to do custom visualization, I would also check the Get experiments table in Python API in the link that @dberenbaum shared.

Then, one could wonder if it's worth doing it, while instead a dedicated stage for parameter tuning could be added, like shown in

could you clarify please? how does it replace the visualization / comparison part?

shcheklein added question I have a question? A: experiments Related to dvc exp labels Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dvc exp run: replacing output folder instead of writing #10527

dvc exp run: replacing output folder instead of writing #10527

ggrrll commented Aug 16, 2024 •

edited

Loading

dberenbaum commented Aug 16, 2024

ggrrll commented Aug 16, 2024 •

edited

Loading

shcheklein commented Aug 19, 2024

dvc exp run: replacing output folder instead of writing #10527

dvc exp run: replacing output folder instead of writing #10527

Comments

ggrrll commented Aug 16, 2024 • edited Loading

Bug Report Help on parameters tuning

Description

Reproduce

Expected

Environment information

dvc doctor

dberenbaum commented Aug 16, 2024

ggrrll commented Aug 16, 2024 • edited Loading

shcheklein commented Aug 19, 2024

ggrrll commented Aug 16, 2024 •

edited

Loading

`dvc doctor`

ggrrll commented Aug 16, 2024 •

edited

Loading