Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc exp run: replacing output folder instead of writing #10527

Open
ggrrll opened this issue Aug 16, 2024 · 3 comments
Open

dvc exp run: replacing output folder instead of writing #10527

ggrrll opened this issue Aug 16, 2024 · 3 comments
Labels
A: experiments Related to dvc exp question I have a question?

Comments

@ggrrll
Copy link

ggrrll commented Aug 16, 2024

Bug Report Help on parameters tuning

Description

(not sure this is actually a 🐛 I or I am doing something wrong...)

when I run a stage with dvc exp run --downstream [mystage-name] , the output folder is replaced, although I have changed the parameter values, which is contained in the folder name

in my dvc.yaml file, mystage looks like

  mystage:
    cmd: python myscript.py 
    deps:
    - some_depends
    outs:
    - data/runs-optimization/
    params:
    - mystage.alpha

and myscript.py gets the default parameters from a params.yaml

I understand why , if I set a parameter value which I already used for a previous experiment, then I get
Stage 'mystage' is cached - skipping run, checking out outputs

but I still want to analyse the results for all parameters combinations ...
at the moment , I just look over my outputs folders, with sub-folders , one for every parameters combination
in that case I would have to run the dvc exp run ... to reload the result from cache ... (?)

Reproduce

(I cannot share code or data, as it's proprietary )

Expected

new folder, with new parameters values in its name

I have checked that running the stage 'manually' (python mystage.py) does create the new folder , as expected

Environment information

  • dvc v. 3.53.2
  • Python 3.9.10
  • System Version: macOS 14.1.2

dvc doctor

DVC version: 3.53.2 (pip)
-------------------------
Platform: Python 3.9.10 on macOS-14.1.2-x86_64-i386-64bit
Subprojects:
        dvc_data = 3.15.2
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.7
Supports:
        http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.6.1, boto3 = 1.34.131)
@dberenbaum
Copy link
Collaborator

So you are looking for how you compare different experiment results, correct? With dvc, it is not expected that they live side-by-side in subdirectories. Instead, each is a git commit. Just like you don't make code changes in git by creating a new copy of a file, you don't create a new file or directory for each experiment with dvc. Instead, you can use commands like dvc exp show to compare experiments. Take a look at https://dvc.org/doc/user-guide/experiment-management/comparing-experiments for more details.

@ggrrll
Copy link
Author

ggrrll commented Aug 16, 2024

@dberenbaum thanks for your answer

yes ( and I am trying to follow https://dvc.ai/blog/hyperparam-tuning )

so, in this way , how can I plot summary statistics (of my metrics , over parameters space) ?
I can see that I can print the metric value, by adding an evaluation step, like done in https:/iterative/example-get-started/blob/main/dvc.yaml, but then what can you do in order to visualize them all, in summary plots? (given that the experiments are cached).
As far as I can see, one has to add another stage, maybe accessing the table with the dvc python api , like shown in https://dvc.org/doc/user-guide/experiment-management/comparing-experiments#other-ways-to-access-the-experiments-table (?)

Then, one could wonder if it's worth doing it, while instead a dedicated stage for parameter tuning could be added, like shown in https://campus.datacamp.com/courses/cicd-for-machine-learning/comparing-training-runs-and-hyperparameter-hp-tuning?ex=5

what are our thoughts / suggestions?

thanks

@shcheklein
Copy link
Member

As a workaround / hack you can try to set persist: true for this output. Please read more docs here. It might help to save all the results in a single directory. I don't think it's possible to use it though if you run multiple experiments using a queue in parallel.

In the DVC VS Code extension you could plot multiple experiments with "custom plots". Check the custom plots:

Screen.Recording.2024-08-18.at.6.40.03.PM.mov

but then what can you do in order to visualize them all, in summary plots? (given that the experiments are cached).

If you need to do custom visualization, I would also check the Get experiments table in Python API in the link that @dberenbaum shared.

Then, one could wonder if it's worth doing it, while instead a dedicated stage for parameter tuning could be added, like shown in

could you clarify please? how does it replace the visualization / comparison part?

@shcheklein shcheklein added question I have a question? A: experiments Related to dvc exp labels Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp question I have a question?
Projects
None yet
Development

No branches or pull requests

3 participants