From b4b47a938d58d52841afd5d481f365bfe8d442dc Mon Sep 17 00:00:00 2001 From: Nathan Molinier Date: Fri, 12 Jan 2024 15:47:46 -0500 Subject: [PATCH 1/7] Update dataset-curation.md --- data/dataset-curation.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/data/dataset-curation.md b/data/dataset-curation.md index 24586155..ceb635e5 100644 --- a/data/dataset-curation.md +++ b/data/dataset-curation.md @@ -551,12 +551,12 @@ sci-bordeaux └── anat ├── sub-001_acq-sag_T2w_label-SC_seg.nii.gz # spinal cord (SC) binary segmentation ├── sub-001_acq-sag_T2w_label-SC_softseg.nii.gz # spinal cord (SC) soft segmentation - ├── sub-001_acq-sag_T2w_label-discs_dlabel.nii.gz # discrete discs labeling (SC) soft segmentation + ├── sub-001_acq-sag_T2w_label-discs_dlabel.nii.gz # discrete discs labeling ├── sub-001_acq-sag_T2w_label-vertebrae_dseg # vertebrae discrete segmentation (segmented stuctures have different values based on the vertebral levels) ├── sub-001_acq-sag_T2w_label-rootlets_dseg # nerve rootlets discrete segmentation (segmented stuctures have different values based on the spinal level) - ├── sub-001_acq-sag_T2w_label-compression_label.nii.gz # binary compression labeling (compression levels are located using only binary labels) - ├── sub-001_acq-sag_T2w_label-PMJ_dlabel # single point-wise label of pmj with value 50 - └── sub-001_acq-sag_T2w_label-lesion_seg # binary lesion segmentation (the related disease is here SCI base on the name of the dataset) + ├── sub-001_acq-sag_T2w_label-compression_label.nii.gz # binary compression labeling (compression levels are indicated as a single voxel with a value '1' at the point of compression) + ├── sub-001_acq-sag_T2w_label-PMJ_dlabel # Pontomedullary junction is indicated as a single voxel with a value '50' + └── sub-001_acq-sag_T2w_label-lesion_seg # lesion binary segmentation (the associated disease could be SCI, MS, etc. and is indicated in the file participants.tsv) ``` From be3819fb4fd06d7891c2257d391f1b3b01507907 Mon Sep 17 00:00:00 2001 From: Nathan Molinier Date: Fri, 12 Jan 2024 16:16:06 -0500 Subject: [PATCH 2/7] Explain dataset_description.json and folder use --- data/dataset-curation.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/data/dataset-curation.md b/data/dataset-curation.md index ceb635e5..60f9bacc 100644 --- a/data/dataset-curation.md +++ b/data/dataset-curation.md @@ -303,10 +303,11 @@ In this section we decided not to fully follow the BIDS derivatives convention. ``` ```{warning} -Derivative data obtained using DIFFERENT processes/workflows should be stored using DIFFERENT derivatives folders. Eg: +Derivative data obtained using different processes/workflows should ideally be stored using different derivatives folders. Eg: - `derivatives/labels/` - `derivatives/sct_5.6/` - `derivatives/fmriprep_2.3/` +However, to streamline data identification and reduce the need for extensive folder crawling, we [opted](https://github.com/neuropoly/data-management/issues/282) to gather common labels, such as binary segmentation and point-wise labeling, into the same derivative folder called labels. ``` ```{note} @@ -400,6 +401,8 @@ In addition to the subjects folders, derived datasets must include their own `da } ``` +The field `GeneratedBy` has to be used to name the different functions and processes used to generate the data. + ```{warning} The `dataset_description.json` file within the derived dataset should include `"DatasetType": "derivative"`. ``` From f6f64f00c87d6ab88d5fc6a3e3a0ba3d3e106b52 Mon Sep 17 00:00:00 2001 From: Nathan Molinier Date: Fri, 12 Jan 2024 17:05:04 -0500 Subject: [PATCH 3/7] Update data/dataset-curation.md Co-authored-by: Julien Cohen-Adad --- data/dataset-curation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/data/dataset-curation.md b/data/dataset-curation.md index 60f9bacc..905abdab 100644 --- a/data/dataset-curation.md +++ b/data/dataset-curation.md @@ -554,7 +554,7 @@ sci-bordeaux └── anat ├── sub-001_acq-sag_T2w_label-SC_seg.nii.gz # spinal cord (SC) binary segmentation ├── sub-001_acq-sag_T2w_label-SC_softseg.nii.gz # spinal cord (SC) soft segmentation - ├── sub-001_acq-sag_T2w_label-discs_dlabel.nii.gz # discrete discs labeling + ├── sub-001_acq-sag_T2w_label-discs_dlabel.nii.gz # discrete discs labeling using the following convention: https://spinalcordtoolbox.com/user_section/tutorials/vertebral-labeling/labeling-conventions.html ├── sub-001_acq-sag_T2w_label-vertebrae_dseg # vertebrae discrete segmentation (segmented stuctures have different values based on the vertebral levels) ├── sub-001_acq-sag_T2w_label-rootlets_dseg # nerve rootlets discrete segmentation (segmented stuctures have different values based on the spinal level) ├── sub-001_acq-sag_T2w_label-compression_label.nii.gz # binary compression labeling (compression levels are indicated as a single voxel with a value '1' at the point of compression) From 3042c39ddb49407b3cb33a179f7865bf8bc02474 Mon Sep 17 00:00:00 2001 From: Nathan Molinier Date: Mon, 15 Jan 2024 14:16:46 -0500 Subject: [PATCH 4/7] Add more cases JSON sidecars --- data/dataset-curation.md | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/data/dataset-curation.md b/data/dataset-curation.md index 905abdab..069426fe 100644 --- a/data/dataset-curation.md +++ b/data/dataset-curation.md @@ -424,7 +424,25 @@ JSON sidecars are companion files linked to data files. They share the same file Therefore, to improve the way we track our data, `.json` sidecars will have to be generated for each data present in derived datasets. Here are few examples of JSON sidecar:
-JSON sidecar (ORIGINAL SPACE) +JSON sidecar (Manually created in the ORIGINAL SPACE) + +```json +{ + "SpatialReference": "orig", + "GeneratedBy": [ + { + "Name": "Manual", + "Author": "Nathan Molinier", + "Date": "2023-07-14 13:43:10" + } + ] +} +``` + +
+ +
+JSON sidecar (Data automatically created then manually corrected in the ORIGINAL SPACE) ```json { @@ -446,7 +464,7 @@ Therefore, to improve the way we track our data, `.json` sidecars will have to b
-JSON sidecar (RESAMPLED and CROPPED) +JSON sidecar (Data RESAMPLED and CROPPED) ```json { @@ -480,7 +498,7 @@ Because the space used for the derived data is different from the original raw d
-JSON sidecar (PAM50 SPACE) +JSON sidecar (Data moved to the PAM50 SPACE) ```json { From eea86c6b36c7f34161f52fe930416bad8c87874a Mon Sep 17 00:00:00 2001 From: Nathan Molinier Date: Mon, 15 Jan 2024 17:24:01 -0500 Subject: [PATCH 5/7] Update dataset-curation.md --- data/dataset-curation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/data/dataset-curation.md b/data/dataset-curation.md index 069426fe..b9ade43e 100644 --- a/data/dataset-curation.md +++ b/data/dataset-curation.md @@ -307,7 +307,7 @@ Derivative data obtained using different processes/workflows should ideally be s - `derivatives/labels/` - `derivatives/sct_5.6/` - `derivatives/fmriprep_2.3/` -However, to streamline data identification and reduce the need for extensive folder crawling, we [opted](https://github.com/neuropoly/data-management/issues/282) to gather common labels, such as binary segmentation and point-wise labeling, into the same derivative folder called labels. +However, to streamline data identification and reduce the need for extensive folder crawling, we [opted](https://github.com/neuropoly/data-management/issues/282) to gather common labels, such as binary segmentation and point-wise labeling, into the same derivative folder called labels. For particular project, having a separe derived folder can still be envisioned. ``` ```{note} From 7c64dff4d6267adb467f947bed97ac7b5f947b3e Mon Sep 17 00:00:00 2001 From: jcohenadad Date: Tue, 16 Jan 2024 15:39:38 -0500 Subject: [PATCH 6/7] Cleanup of raw section --- data/dataset-curation.md | 130 +++++++++++++++------------------------ 1 file changed, 49 insertions(+), 81 deletions(-) diff --git a/data/dataset-curation.md b/data/dataset-curation.md index b9ade43e..386b30eb 100644 --- a/data/dataset-curation.md +++ b/data/dataset-curation.md @@ -2,48 +2,60 @@ ## Converting data to BIDS -All git-annex datasets should be BIDS-compliant. For more information about the BIDS standard, please visit [http://bids.neuroimaging.io](http://bids.neuroimaging.io). +All git-annex datasets should be BIDS-compliant. For more information about the BIDS standard, please visit [http://bids.neuroimaging.io](http://bids.neuroimaging.io). For some examples of BIDS datasets, visit [this page](https://github.com/bids-standard/bids-examples). A quick way to verify compliance with the convention is this [online BIDS validator](https://bids-standard.github.io/bids-validator/). -When you receive data from an external collaborator, you can save them under a temporary location: `duke/temp`. +When you receive raw data from an external collaborator, save them under a temporary location on one of NeuroPoly's server, e.g.: `duke/temp`. Then, inspect the data and convert them to BIDS. It is recommended to write a script that does the conversion. The script should then be saved under the `code` folder of the final dataset. Some previous scripts can be found on [GitHub](https://github.com/neuropoly/data-management/tree/master/scripts) or under the `code` folder of already existing datasets. -Once the data are converted to BIDS and [uploaded](git-datasets.md#upload) to git-annex repository, delete the temporary folder to save space. +```{important} +Once the data are converted to BIDS and [uploaded](git-datasets.md#upload) to git-annex repository, please delete the temporary folder. +``` ## Building the `raw` dataset -> [Brackets] are characterizing optional informations +The `raw` dataset corresponds to the core dataset that contains all the different acquisitions generated for one or several subjects. **NO** postprocessing steps should be applied to these acquisitions. -The `raw` dataset corresponds to the core dataset that contains all the different acquisition generated for one or several subjects. **NO** postprocessing steps should be applied to these acquisitions. +Subjects folders in the `raw` dataset are structured as follows for MRI, with folders corresponding to subjects, [sessions] and MRI modalities: -### Folders structure and filenames +### Raw structure -Subjects folders in the `raw` dataset are structured as follows for MRI, with folders corresponding to subjects, [sessions] and MRI modalities: +Useful BIDS specifications are: +- [File naming conventions](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#filesystem-structure), +- [Modality-agnostic conventions](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#code), +- [MRI-specific conventions](https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/01-magnetic-resonance-imaging-data.html), +- [Microscopy-specific conventions](https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/10-microscopy.html) -#### Raw structure +The example below applies for MRI data: ``` -sub-