Migrate PAM50 to pip #1

kousu · 2021-04-06T15:37:04Z

Part of spinalcordtoolbox/spinalcordtoolbox#2669

jcohenadad

i am not sure if i understand everything correctly, but i notice file name changed for e.g.:

spinalcordtoolbox/data/PAM50/atlas/PAM50_atlas_00.nii.gz

so what happens if, e.g. someone has SCT installed under, e.g.: sct-v3.4.4 instead of spinalcordtoolbox. Will it still work? I suppose we are talking about the hierarchy of the python package, and there, it will always be spinalcordtoolbox, right?

kousu · 2021-04-06T16:21:51Z

i am not sure if i understand everything correctly, but i notice file name changed for e.g.:
spinalcordtoolbox/data/PAM50/atlas/PAM50_atlas_00.nii.gz 
so what happens if, e.g. someone has SCT installed under, e.g.: sct-v3.4.4 instead of spinalcordtoolbox. Will it still work? I suppose we are talking about the hierarchy of the python package, and there, it will always be spinalcordtoolbox, right?

Instead of just saying `sct/data/PAM50/atlas/PAM50_atlas_00.nii.gz' we'll use

import spinalcordtoolbox.data.PAM50

...

importlib.resources.path(spinalcordtoolbox.data.PAM50, 'atlas/PAM50_atlas_00.nii.gz').__enter__() # maybe a helper to shorten this...

This also works:

import spinalcordtoolbox.data.PAM50.atlas

...

importlib.resources.path(spinalcordtoolbox.data.PAM50.atlas, 'PAM50_atlas_00.nii.gz').__enter__() # maybe a helper to shorten this...

The advantage of this is:

You get an ImportError if the data isn't installed, which should be a lot clearer where the problem is
pip keeps a cache in ~/.cache/pip of previous versions of packages, so reinstalling doesn't require redownloading. And this is safe because it keys them by hash and version string, so it can't be fooled. If the new version of SCT keeps using the old version of the data, no one needs to redownload it.
2. and we can exploit this to use CI caching to save a lot of bandwidth on testing
The data is contained under site-packages/ (so, currently, $SCT_DIR/python/envs/venv_sct/lib/python3.6/site-packages/spinalcordtoolbox/data) so it can't get muddled up with a different install.
In principle we can use RECORD to integrity check the datasets -- or even the entire spinalcordtoolbox -- at any time, even after it's installed.
- This sounds like a good thing to build into sct_check_dependencies

The disadvantage is:

You can't just go look at the data like you can when it's in your working directory, so pip isn't great for courseware like course_beijing or sct-example-data. I don't know what we should do about those. My plan for the next, like, 72 hours is to just leave them alone and keep them in sct_download_data. Maybe that stuff should just be curl | tar? Maybe we can talk about that today?
- over in my main thread (Packaging Dependencies spinalcordtoolbox#2669 (comment)) I recalled Stata's . net get $pkg feature which drops optional data into your working dir, as opposed to into the installation dir; it was designed with tutorials in mind. I've never seen that in any other language.

Template for packaging spinalcordtoolbox datasets in pip.

b66e6b3

kousu requested review from zougloub, jcohenadad and alexfoias April 6, 2021 15:37

jcohenadad reviewed Apr 6, 2021

View reviewed changes

Migrate to pip package

a24d1c8

kousu force-pushed the ng/pip branch from f6dab6f to a24d1c8 Compare April 6, 2021 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate PAM50 to pip #1

Migrate PAM50 to pip #1

kousu commented Apr 6, 2021 •

edited

Loading

jcohenadad left a comment

kousu commented Apr 6, 2021 •

edited

Loading

Migrate PAM50 to pip #1

Are you sure you want to change the base?

Migrate PAM50 to pip #1

Conversation

kousu commented Apr 6, 2021 • edited Loading

jcohenadad left a comment

Choose a reason for hiding this comment

kousu commented Apr 6, 2021 • edited Loading

kousu commented Apr 6, 2021 •

edited

Loading

kousu commented Apr 6, 2021 •

edited

Loading