Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More pessimistic caching of outputs #1719

Merged
merged 3 commits into from
May 15, 2018
Merged

More pessimistic caching of outputs #1719

merged 3 commits into from
May 15, 2018

Conversation

zougloub
Copy link
Contributor

@zougloub zougloub commented May 11, 2018

Noted in #1625 that the caching was very optimistic, and this PR brings:

  • A set of helpers to perform better output caching
  • Usage of the helpers in the code that does caching

Using the caching helpers looks like:

cache_sig = sct.cache_signature(...)
cachefile = os.path.join(curdir, "name_of_the.cache")
if sct.cache_valid(cachefile, cache_sig) and other_test_that_outputs_exist():
    reuse_cached_outputs()
else:
    create_outputs()
    sct.cache_save(cachefile, cache_sig)

Notes:

  • The caching is performed by hashing any content that could influence on the output (except SCT code and its dependencies... that would be possible, but probably overkill)
  • sct_convert doesn't produce reproducible outputs, so (until it's fixed?) avoid to use caching based on its outputs

@zougloub
Copy link
Contributor Author

@benjamindeleener @jcohenadad even if this is not finished, if you can take a look at the code and provide preliminary comments, just to make sure that you agree on the approach.

@jcohenadad
Copy link
Member

@zougloub i definitely like this approach. So, in the case of sct_straightening_spinalcord's output (ie: warping field), which depends on (i) the centerline (mandatory), (ii) disc_label (optional) and (iii) the reference image (optional), we would have something like:

cache_sig = cache_signature(input_data=[data_seg.nii.gz, disc_labels.nii.gz]):

What did you have in mind for the field input_params={}? In the example above, we could use strings like "classical" and "disc-based", although that sounds like a bad idea (if string changes, broken compatibility, etc.). Maybe caching the data only would work...

@zougloub
Copy link
Contributor Author

Maybe this is a bad example for input_params because I didn't try to understand.. but I saw -x spline in an sct_straighten_spinalcord invocation, and stored this option in input_params. But even without a concrete usage example I'd have added the argument...

@zougloub zougloub changed the title (WIP) More pessimistic caching of outputs (see #1625) More pessimistic caching of outputs (see #1625) May 14, 2018
@zougloub zougloub force-pushed the cJ-caching branch 4 times, most recently from 10dc6f5 to ebd408f Compare May 14, 2018 16:23
@zougloub zougloub requested review from jcohenadad and benjamindeleener and removed request for jcohenadad May 14, 2018 16:24
@zougloub
Copy link
Contributor Author

OK, here we go.

  • I am not 100% sure about what I did in sct_make_ground_truth (the call to sct.run(['sct_concat_transfo', '-w', 'warp_straight2curve.nii.gz', '-d', ftmp_data, '-o', 'warp_straight2curve.nii.gz']) is confusing me a little).
  • For now there's no check on the outputs signature. This could be a problem today, if someone was to call sct_straighten_spinalcord directly, but so far I don't think it's likely. Going ahead and also storing outputs paths and signatures, so as to not reuse those if they changed, would take a bit more work.

Copy link
Member

@jcohenadad jcohenadad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I have my doubts about the usefulness of sct_make_ground_truth (Is sct_make_ground_truth still useful? #1732), so let's not spend too much effort on that one for validating this PR.
  • I ran sequentially sct_straighten_spinalcord and then sct_label_vertebrae and the straightening was re-done, while it should not have. I understand this is related to your second point here, but some people (very few probably) could in fact run sct_straighten_spinalcord separately. So to be clean we should address this issue at some point: Output of sct_straighen_spinalcord should be able to be reused via the preexisting cache mechanism #1733 --> ACCEPTABLE
  • To check cross-compatibility, I've run batch_processing without re-downloading the data: they were already processed by the current master's version cdbe5b8. Straightening was run for sct_label_vertebrae (expected), and also for sct_register_to_template (expected, as per previous point). No crash/issue. Output is clean. --> OK
  • Tried re-running sct_label_vertebrae two times sequentially: the 2nd time, the cached straightening was used. --> OK
  • Tried to re-run sct_straighten_spinalcord and then sct_label_vertebrae. The cached straightening was used --> OK
  • Then, ran sct_register_to_template. Straightening was re-ran, which is expected (and desired), because in this case the straightening depends on the labels. In fact, in that case, the previous straighening could have been used (because only two labels were used, hence did not need the "fancy" disc-based straightening). However, accounting for the type of straightening would require more work, and is probably not worth the development time, given that at some point, straightening will only be used for register_to_template (i.e., label_vertebrae will be based on deep learning, without straightening needed) --> ACCEPTABLE
  • Tried re-running sct_register_to_template sequentially. Cached straightening was used. --> OK
  • Tried modifying the labels (by adding one label), then re-running sct_register_to_template. Straightening was re-run (and it should) --> OK

@jcohenadad jcohenadad added this to the 3.2.0 milestone May 15, 2018
@jcohenadad jcohenadad merged commit 05fb6be into master May 15, 2018
@jcohenadad jcohenadad deleted the cJ-caching branch May 15, 2018 04:18
@jcohenadad jcohenadad added the enhancement category: improves performance/results of an existing feature label May 27, 2018
@jcohenadad jcohenadad changed the title More pessimistic caching of outputs (see #1625) More pessimistic caching of outputs May 27, 2018
jcohenadad pushed a commit that referenced this pull request Dec 18, 2019
* sct_utils: add caching helpers (#1625)

* use caching helpers in all code creating warping fields, to avoid reusing bad ones (see #1669)


Former-commit-id: 05fb6be
@joshuacwnewton joshuacwnewton mentioned this pull request Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement category: improves performance/results of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants