Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with subjects scanned on different scanners #102

Open
kousu opened this issue Nov 15, 2021 · 6 comments
Open

Deal with subjects scanned on different scanners #102

kousu opened this issue Nov 15, 2021 · 6 comments

Comments

@kousu
Copy link
Contributor

kousu commented Nov 15, 2021

There are 15 sub-tokyo subjects; however, there's actually only 5 real subjects involved, each scanned on three different MRI scanners. For example:

u108545@joplin:~/data-multi-subject$ ls sub-tokyo*01
sub-tokyo750w01:
anat  dwi

sub-tokyoIngenia01:
anat  dwi

sub-tokyoSkyra01:
anat  dwi
u108545@joplin:~/data-multi-subject$ egrep 'sub-tokyo.*?01[[:space:]]*(F|M)' participants.tsv 
sub-tokyo750w01	M	25	-	-	2019-10-01	tokyo750w	the University of Tokyo	GE	MR750w	-	24_LX_MR_Software_release:DV24.0_R01_1344.a	"K. Kamiya, Y. Suzuki"
sub-tokyoIngenia01	M	25	-	-	2019-10-01	tokyo	Ingenia the University of Tokyo	Philips	Ingenia	-	5.3.1_5.3.1.1"K. Kamiya, Y. Suzuki"
sub-tokyoSkyra01	M	25	-	-	2019-10-01	tokyoSkyra	the University of Tokyo	Siemens	Skyra	HeadNeck_20	syngo_MR_E11	"K. Kamiya, Y. Suzuki"

It is safer, and probably more BIDS-compliant, if we represented the "different scanner" field using an acq- entity (or possibly ses-), and put these scans all under a single folder (sub-tokyo01). Then we only need to record their tabular data once in participants.tsv and repairs like #96 won't be so fraught to perform.

Discovered in #96 (comment)

@jcohenadad
Copy link
Member

This is a valid concern. However, merging these participants would break the analysis code, so there is a pros/cons here.

@kousu
Copy link
Contributor Author

kousu commented Nov 15, 2021

I'll fix the analysis code.

@kousu
Copy link
Contributor Author

kousu commented Nov 15, 2021

Turns out, the hardware field already has a place to go in BIDS: it goes in the .json, not in the filename, and we have this data in the right place already:

u108545@joplin:~/data-multi-subject$ grep ManufacturersModelName sub-tokyo*01/anat/*.json 
sub-tokyo750w01/anat/sub-tokyo750w01_acq-MToff_MTS.json:	"ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_acq-MTon_MTS.json:	"ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_acq-T1w_MTS.json:	"ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_T1w.json:	"ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_T2star.json:	"ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_T2w.json:	"ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_acq-MToff_MTS.json:	"ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_acq-MTon_MTS.json:	"ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_acq-T1w_MTS.json:	"ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_T1w.json:	"ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_T2star.json:	"ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_T2w.json:	"ManufacturersModelName": "Ingenia_CX",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_acq-MToff_MTS.json:	"ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_acq-MTon_MTS.json:	"ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_acq-T1w_MTS.json:	"ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_T1w.json:	"ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_T2star.json:	"ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_T2w.json:	"ManufacturersModelName": "Skyra",

And BIDS recommends encoding multiple visits/scans by nesting them a level deeper under ses-<label>/.

I propose

  1. either not encoding the scanner in the filename at all but adding a session field, or encoding it in the 'session' field: sub-tokyo{scanner}{id} -> sub-tokyo{id}_ses-{scanner}

    So, either:

    u108545@joplin:~/data-multi-subject$ mkdir -p sub-tokyo05 && git mv sub-tokyoIngenia05/ sub-tokyo05/ses-01
    

    or

    u108545@joplin:~/data-multi-subject$ mkdir -p sub-tokyo05 && git mv sub-tokyoIngenia05/ sub-tokyo05/ses-Ingenia
    

    but repeated for each every subject. For most subjects with only one session, BIDS still wants us to nest a ses-01/ folder:

    The extra session layer (at least one /ses- subfolder) SHOULD be added for all subjects

  2. Merging the tokyo subjects:

    either

    git mv sub-tokyoSkyra{id} sub-tokyo{id}/ses-02
    git mv sub-tokyo750w{id} sub-tokyo{id}/ses-03
    

    or

    git mv sub-tokyoSkyra{id} sub-tokyo{id}/ses-Skyra
    git mv sub-tokyo750w{id} sub-tokyo{id}/ses-750w
    
  3. Move the date, manufacturer, manufacturers_model_name from participants.tsv to per-subject sub-tokyo{id}/sub-tokyo{id}_sessions.tsv files

  4. Changing the analysis code to parse out the information when it needs it from either the .jsons, or the _session.tsv files, not the filenames.

@jcohenadad
Copy link
Member

thank you @kousu, this seems like a very reasonable plan. In terms of index vs. scanner name in the filename, i do have a slight preference for encoding in the file name, just because it is more human friendly

@kousu
Copy link
Contributor Author

kousu commented Nov 16, 2021

thank you @kousu, this seems like a very reasonable plan. In terms of index vs. scanner name in the filename, i do have a slight preference for encoding in the file name, just because it is more human friendly

Great. I can do that!

@jcohenadad jcohenadad changed the title Merge subjects scanned on different scanners Deal with subjects scanned on different scanners May 12, 2024
@jcohenadad
Copy link
Member

Reviving this thread, given a recent comment #166 and the demographic-based project from @renelabounek. We should find a reasonable strategy to deal with the same subjects being scanned at multiple sites. The solutions proposed in #102 (comment) is problematic, in that the logic of the analysis code and results should be drastically different. I'm wondering if simply adding a column in the https:/spine-generic/data-multi-subject/blob/113b258695074b77d40ba987474eddc14f9d9698/participants.tsv with an arbitrary ID for each subject could properly address this? Then, for projects where the demographics of the subject is relevant (eg: @renelabounek project), the specific analysis code could use that information (by, eg., selecting non-duplicate subjects based on their IDs as opposed to based on the participant_id).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants