Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate PAM50 to pip #1

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Migrate PAM50 to pip #1

wants to merge 2 commits into from

Conversation

kousu
Copy link

@kousu kousu commented Apr 6, 2021

Copy link
Member

@jcohenadad jcohenadad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure if i understand everything correctly, but i notice file name changed for e.g.:

spinalcordtoolbox/data/PAM50/atlas/PAM50_atlas_00.nii.gz 

so what happens if, e.g. someone has SCT installed under, e.g.: sct-v3.4.4 instead of spinalcordtoolbox. Will it still work? I suppose we are talking about the hierarchy of the python package, and there, it will always be spinalcordtoolbox, right?

@kousu
Copy link
Author

kousu commented Apr 6, 2021

i am not sure if i understand everything correctly, but i notice file name changed for e.g.:

spinalcordtoolbox/data/PAM50/atlas/PAM50_atlas_00.nii.gz 

so what happens if, e.g. someone has SCT installed under, e.g.: sct-v3.4.4 instead of spinalcordtoolbox. Will it still work? I suppose we are talking about the hierarchy of the python package, and there, it will always be spinalcordtoolbox, right?

Instead of just saying `sct/data/PAM50/atlas/PAM50_atlas_00.nii.gz' we'll use

import spinalcordtoolbox.data.PAM50

...

importlib.resources.path(spinalcordtoolbox.data.PAM50, 'atlas/PAM50_atlas_00.nii.gz').__enter__() # maybe a helper to shorten this...

This also works:

import spinalcordtoolbox.data.PAM50.atlas

...

importlib.resources.path(spinalcordtoolbox.data.PAM50.atlas, 'PAM50_atlas_00.nii.gz').__enter__() # maybe a helper to shorten this...

The advantage of this is:

  1. You get an ImportError if the data isn't installed, which should be a lot clearer where the problem is
  2. pip keeps a cache in ~/.cache/pip of previous versions of packages, so reinstalling doesn't require redownloading. And this is safe because it keys them by hash and version string, so it can't be fooled. If the new version of SCT keeps using the old version of the data, no one needs to redownload it.
    2. and we can exploit this to use CI caching to save a lot of bandwidth on testing
  3. The data is contained under site-packages/ (so, currently, $SCT_DIR/python/envs/venv_sct/lib/python3.6/site-packages/spinalcordtoolbox/data) so it can't get muddled up with a different install.
  4. In principle we can use RECORD to integrity check the datasets -- or even the entire spinalcordtoolbox -- at any time, even after it's installed.
    • This sounds like a good thing to build into sct_check_dependencies

The disadvantage is:

  1. You can't just go look at the data like you can when it's in your working directory, so pip isn't great for courseware like course_beijing or sct-example-data. I don't know what we should do about those. My plan for the next, like, 72 hours is to just leave them alone and keep them in sct_download_data. Maybe that stuff should just be curl | tar? Maybe we can talk about that today?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants