General Usage Guidelines
Before we implement a CuBIDS
workflow, let’s define the terminology
and take a look at some of the commands available in the software.
More definitions
Key Group
A Key Group is a unique set of BIDS key-value pairs, excluding identifiers such as subject and session. For example, the files:
bids-root/sub-1/ses-1/func/sub-1_ses-1_acq-mb_dir-PA_task-rest_bold.nii.gz
bids-root/sub-1/ses-2/func/sub-1_ses-2_acq-mb_dir_PA_task-rest_bold.nii.gz
bids-root/sub-2/ses-1/func/sub-2_ses-1_acq-mb_dir-PA_task-rest_bold.nii.gz
Would all share the same Key Group. If these scans were all acquired as a part of the same study on the same scanner with exactly the same acquisition parameters, this naming convention would suffice.
However, in large multi-scanner, multi-site, or longitudinal studies where acquisition parameters change over time, it’s possible that the same Key Group could contain scans that differ in important ways.
CuBIDS
examines all acquisitions within a Key Group to see if there are any images
that differ in a set of important acquisition parameters.
The subsets of consistent acquisition parameter sets within a Key Group are called a Parameter Group.
Parameter Group
A Parameter Group is a subset of a Key Group that contains images with the same acquisition parameters.
Even though two images may belong to the same Key Group and are valid BIDS,
they may have images with different acquisition parameters.
There is nothing fundamentally wrong with this —
the bids-validator
will often simply flag these differences with a Warning
,
but not necessarily suggest changes.
That being said,
there can be detrimental consequences downstream if the different parameters cause the
same preprocessing pipelines to configure differently to images of the same Key Group.
Acquisition Group
We define an Acquisition Group as a collection of sessions across participants that contain the exact same set of Key and Parameter Groups. Since Key Groups are based on the BIDS filenames— and therefore both MRI image type and acquisition specific— each BIDS session directory contains images that belong to a set of Parameter Groups. CuBIDS assigns each session, or set of Parameter Groups, to an Acquisition Group such that all sessions in an Acquisition Group possesses an identical set of scan acquisitions and metadata parameters across all image modalities present in the dataset. We find Acquisition Groups to be a particularly useful categorization of BIDS data, as they identify homogeneous sets of sessions (not individual scans) in a large dataset. They are also useful for expediting the testing of pipelines; if a BIDS App runs successfully on a single subject from each Acquisition Group, one can be confident that it will handle all combinations of scanning parameters in the entire dataset.
The Acquisition Groups that subjects belong to are listed in _AcqGrouping.csv
,
while the Key Groups and Parameter Groups that define each Acquisition Group are noted in
_AcqGroupingInfo.txt
.
The _summary.tsv
File
This file contains all the detected Key Groups and Parameter Groups. It provides an opportunity to evaluate your data and decide how to handle heterogeneity.
Below is an example _summary.tsv
of the run-1 DWI Key Group in the PNC [1].
This reflects the original data that has been converted to BIDS using a heuristic.
It is similar to what you will see when you first use this functionality:
Notes |
ManualCheck |
MergeInto |
RenameKeyGroup |
KeyParamGroup |
KeyGroup |
ParamGroup |
Counts |
Dim1Size |
Dim2Size |
Dim3Size |
EchoTime |
EffectiveEchoSpacing |
FlipAngle |
HasFieldmap |
KeyGroupCount |
Modality |
NSliceTimes |
NumVolumes |
Obliquity |
ParallelReductionFactorInPlane |
PartialFourier |
PhaseEncodingDirection |
RepetitionTime |
TotalReadoutTime |
UsedAsFieldmap |
VoxelSizeDim1 |
VoxelSizeDim2 |
VoxelSizeDim3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
acquisition-VARIANTNoFmap_datatype-dwi_run-1_suffix-dwi |
datatype-dwi_run-1_suffix-dwi__2 |
datatype-dwi_run-1_suffix-dwi |
2 |
25 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
FALSE |
1426 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
8.1 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
|||
acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi |
datatype-dwi_run-1_suffix-dwi__3 |
datatype-dwi_run-1_suffix-dwi |
3 |
6 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
TRUE |
1426 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
9.0 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
|||
acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi |
datatype-dwi_run-1_suffix-dwi__4 |
datatype-dwi_run-1_suffix-dwi |
4 |
3 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
TRUE |
1426 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
9.8 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
|||
acquisition-VARIANTDim3SizeVoxelSizeDim3_datatype-dwi_run-1_suffix-dwi |
datatype-dwi_run-1_suffix-dwi__5 |
datatype-dwi_run-1_suffix-dwi |
5 |
2 |
128 |
128 |
46 |
0.082 |
0.000267 |
90 |
TRUE |
1426 |
dwi |
46 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
8.1 |
0.034 |
FALSE |
1.875 |
1.875 |
3.0 |
|||
acquisition-VARIANTEchoTimeEffectiveEchoSpacingRepetitionTimeTotalReadoutTime_datatype-dwi_run-1_suffix-dwi |
datatype-dwi_run-1_suffix-dwi__6 |
datatype-dwi_run-1_suffix-dwi |
6 |
1 |
128 |
128 |
70 |
0.102 |
0.0008 |
90 |
TRUE |
1426 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
12.3 |
0.102 |
FALSE |
1.875 |
1.875 |
2.0 |
|||
acquisition-VARIANTObliquity_datatype-dwi_run-1_suffix-dwi |
datatype-dwi_run-1_suffix-dwi__7 |
datatype-dwi_run-1_suffix-dwi |
7 |
1 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
TRUE |
1426 |
dwi |
70 |
35.0 |
TRUE |
3.0 |
0.75 |
j- |
8.1 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
The _files.tsv
file
This file contains one row per imaging file in the BIDS directory. You won’t need to edit this file directly, but it keeps track of every file’s assignment to Key and Parameter Groups.
Modifying Key and Parameter Group Assignments
Sometimes we see that there are important differences in acquisition parameters within a Key Group.
If these differences impact how a pipeline will process the data,
it makes sense to assign the scans in that Parameter Group to a different Key Group
(i.e., assign them a different BIDS name).
This can be accomplished by editing the empty columns in the _summary.csv file produced by
cubids group
.
Once the columns have been edited you can apply the changes to BIDS data using
$ cubids apply /bids/dir keyparam_edited new_keyparam_prefix
The changes in keyparam_edited_summary.csv
will be applied to the BIDS data in /bids/dir
and the new Key and Parameter groups will be saved to csv files starting with new_keyparam_prefix
.
Note: fieldmaps keygroups with variant parameters will be identified but not renamed.
The _AcqGrouping.tsv
file
The _AcqGrouping.tsv
file organizes the dataset by session and tags each one with its
Acquisition Group number.
The _AcqGroupInfo.txt
file
The _AcqGroupInfo.txt
file lists all Key Groups that belong to a given Acquisition Group
along with the number of sessions each group possesses.
Visualizing and summarizing metadata heterogeneity
Use cubids group
to generate your dataset’s Key Groups and Parameter Groups:
$ cubids group FULL/PATH/TO/BIDS/DIR FULL/PATH/TO/v0
This will output four files, including the summary and files tsvs described above,
prefixed by the second argument v0
.
Applying changes
The cubids apply
program provides an easy way for users to manipulate their datasets.
Specifically,
cubids apply
can rename files according to the users’ specification in a tracked and organized way.
Here, the summary.tsv functions as an interface modifications; users can mark
Parameter Groups
they want to rename (or delete) in a dedicated column of the summary.tsv and
pass that edited tsv as an argument to cubids apply
.
Detecting Variant Groups
Additionally, cubids apply
can automatically rename files in Variant Groups
based on their scanning parameters that vary from those in their Key Groups’
Dominant Parameter Groups.
Renaming is automatically suggested when the summary.tsv is generated from a cubids group
run,
with the suggested new name listed in the tsv’s Rename Key Group column.
CuBIDS populates this column for all Variant Groups
(e.g., every Parameter Group except the Dominant one).
Specifically, CuBIDS will suggest renaming all non-dominant Parameter Group to include VARIANT*
in their acquisition field where *
is the reason
the Parameter Group varies from the Dominant Group.
For example, when CuBIDS encounters a Parameter Group with a repetition time that varies from
the one present in the Dominant Group,
it will automatically suggest renaming all scans in that Variant Group to include
acquisition-VARIANTRepetitionTime
in their filenames.
When the user runs cubids apply
,
filenames will get renamed according to the auto-generated names in the “Rename Key Group” column
in the summary.tsv
Deleting a mistake
To remove files in a Parameter Group from your BIDS data,
you simply set the MergeInto
value to 0
.
We see in our data that there is a strange scan that has a RepetitionTime
of 12.3
seconds and is also variant with respect to EffectiveEchoSpacing and EchoTime.
We elect to remove this scan from our dataset because we do not want these parameters to affect our
analyses.
To remove these files from your BIDS data,
add a 0
to MergeInto
and save the new tsv as v0_edited_summary.tsv
Notes |
ManualCheck |
MergeInto |
RenameKeyGroup |
KeyParamGroup |
KeyGroup |
ParamGroup |
Counts |
Dim1Size |
Dim2Size |
Dim3Size |
EchoTime |
EffectiveEchoSpacing |
FlipAngle |
HasFieldmap |
KeyGroupCount |
Modality |
NSliceTimes |
NumVolumes |
Obliquity |
ParallelReductionFactorInPlane |
PartialFourier |
PhaseEncodingDirection |
RepetitionTime |
TotalReadoutTime |
UsedAsFieldmap |
VoxelSizeDim1 |
VoxelSizeDim2 |
VoxelSizeDim3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
acquisition-VARIANTNoFmap_datatype-dwi_run-1_suffix-dwi |
datatype-dwi_run-1_suffix-dwi__2 |
datatype-dwi_run-1_suffix-dwi |
2 |
25 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
FALSE |
1426 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
8.1 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
|||
acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi |
datatype-dwi_run-1_suffix-dwi__3 |
datatype-dwi_run-1_suffix-dwi |
3 |
6 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
TRUE |
1426 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
9.0 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
|||
acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi |
datatype-dwi_run-1_suffix-dwi__4 |
datatype-dwi_run-1_suffix-dwi |
4 |
3 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
TRUE |
1426 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
9.8 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
|||
acquisition-VARIANTDim3SizeVoxelSizeDim3_datatype-dwi_run-1_suffix-dwi |
datatype-dwi_run-1_suffix-dwi__5 |
datatype-dwi_run-1_suffix-dwi |
5 |
2 |
128 |
128 |
46 |
0.082 |
0.000267 |
90 |
TRUE |
1426 |
dwi |
46 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
8.1 |
0.034 |
FALSE |
1.875 |
1.875 |
3.0 |
|||
0 |
acquisition-VARIANTEchoTimeEffectiveEchoSpacingRepetitionTimeTotalReadoutTime_datatype-dwi_run-1_suffix-dwi |
datatype-dwi_run-1_suffix-dwi__6 |
datatype-dwi_run-1_suffix-dwi |
6 |
1 |
128 |
128 |
70 |
0.102 |
0.0008 |
90 |
TRUE |
1426 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
12.3 |
0.102 |
FALSE |
1.875 |
1.875 |
2.0 |
||
acquisition-VARIANTObliquity_datatype-dwi_run-1_suffix-dwi |
datatype-dwi_run-1_suffix-dwi__7 |
datatype-dwi_run-1_suffix-dwi |
7 |
1 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
TRUE |
1426 |
dwi |
70 |
35.0 |
TRUE |
3.0 |
0.75 |
j- |
8.1 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
In this example, users can apply the changes to BIDS data using the following command:
$ cubids apply FULL/PATH/TO/BIDS/DIR FULL/PATH/TO/v0_edited_summary.tsv FULL/PATH/TO/v0_files.tsv FULL/PATH/TO/v1
The changes in v0_edited_summary.tsv
will be applied to the BIDS data
and the new Key and Parameter Groups will be saved to tsv files starting with v1
.
Applying these changes we would see:
Notes |
ManualCheck |
MergeInto |
RenameKeyGroup |
KeyParamGroup |
KeyGroup |
ParamGroup |
Counts |
Dim1Size |
Dim2Size |
Dim3Size |
EchoTime |
EffectiveEchoSpacing |
FlipAngle |
HasFieldmap |
KeyGroupCount |
Modality |
NSliceTimes |
NumVolumes |
Obliquity |
ParallelReductionFactorInPlane |
PartialFourier |
PhaseEncodingDirection |
RepetitionTime |
TotalReadoutTime |
UsedAsFieldmap |
VoxelSizeDim1 |
VoxelSizeDim2 |
VoxelSizeDim3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datatype-dwi_run-1_suffix-dwi__1 |
datatype-dwi_run-1_suffix-dwi |
1 |
1388 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
TRUE |
1388 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
8.1 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
||||
acquisition-VARIANTNoFmap_datatype-dwi_run-1_suffix-dwi__1 |
acquisition-VARIANTNoFmap_datatype-dwi_run-1_suffix-dwi |
1 |
25 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
FALSE |
25 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
8.1 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
||||
acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi__1 |
acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi |
1 |
6 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
TRUE |
9 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
9.0 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
||||
acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi__2 |
acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi |
2 |
3 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
TRUE |
9 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
9.8 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
||||
acquisition-VARIANTDim3SizeVoxelSizeDim3_datatype-dwi_run-1_suffix-dwi__1 |
acquisition-VARIANTDim3SizeVoxelSizeDim3_datatype-dwi_run-1_suffix-dwi |
1 |
2 |
128 |
128 |
46 |
0.082 |
0.000267 |
90 |
TRUE |
2 |
dwi |
46 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
8.1 |
0.034 |
FALSE |
1.875 |
1.875 |
3.0 |
||||
acquisition-VARIANTEchoTimeEffectiveEchoSpacingRepetitionTimeTotalReadoutTime_datatype-dwi_run-1_suffix-dwi__1 |
acquisition-VARIANTEchoTimeEffectiveEchoSpacingRepetitionTimeTotalReadoutTime_datatype-dwi_run-1_suffix-dwi |
1 |
1 |
128 |
128 |
70 |
0.102 |
0.0008 |
90 |
TRUE |
1 |
dwi |
70 |
35.0 |
FALSE |
3.0 |
0.75 |
j- |
12.3 |
0.102 |
FALSE |
1.875 |
1.875 |
2.0 |
||||
acquisition-VARIANTObliquity_datatype-dwi_run-1_suffix-dwi__1 |
acquisition-VARIANTObliquity_datatype-dwi_run-1_suffix-dwi |
1 |
1 |
128 |
128 |
70 |
0.082 |
0.000267 |
90 |
TRUE |
1 |
dwi |
70 |
35.0 |
TRUE |
3.0 |
0.75 |
j- |
8.1 |
0.034 |
FALSE |
1.875 |
1.875 |
2.0 |
Customizable configuration
CuBIDS
also features an optional, customizable, MRI image type-specific configuration file.
This file can be passed as an argument to cubids group
and cubids apply
using the --config
flag and allows users to customize grouping settings based on
MRI image type and parameter.
Each Key Group
is associated with one (and only one) MRI image type,
as BIDS filenames include MRI image type-specific values as their suffixes.
This easy-to-modify configuration file provides several benefits to curation.
First, it allows users to add and remove metadata parameters from the set that determines groupings.
This can be very useful if a user deems a specific metadata parameter irrelevant and wishes to collapse
variation based on that parameter into a single Parameter Group.
Second, the configuration file allows users to apply tolerances for parameters with numerical values.
This functionality allows users to avoid very small differences in scanning parameters
(i.e., a TR of 3.0s vs 3.0001s)
being split into different Parameter Groups
.
Third, the configuration file allows users to determine which scanning parameters
are listed in the acquisition field when auto-renaming is applied to Variant Groups
.
Exemplar testing
In addition to facilitating curation of large, heterogeneous BIDS datasets,
CuBIDS
also prepares datasets for testing BIDS Apps.
This portion of the CuBIDS
workflow relies on the concept of the Acquisition Group:
a set of sessions that have identical scan types and metadata across all imaging
modalities present in the session set.
Specifically, cubids copy-exemplars
copies one subject from each
Acquisition Group into a separate directory,
which we call an Exemplar Dataset
.
Since the Exemplar Dataset
contains one randomly selected subject from each unique
Acquisition Group in the dataset,
it will be a valid BIDS dataset that spans the entire metadata parameter space of the full study.
If users run cubids copy-exemplars
with the --use-datalad
flag,
the program will ensure that the Exemplar Dataset
is tracked and saved in DataLad
.
If the user chooses to forgo this flag,
the Exemplar Dataset
will be a standard directory located on the filesystem.
Once the Exemplar Dataset
has been created,
a user can test it with a BIDS App (e.g., fMRIPrep or QSIPrep)
to ensure that each unique set of scanning parameters will pass through the pipelines successfully.
Because BIDS Apps auto-configure workflows based on the metadata encountered,
they will process all scans in each Acquisition Group
in the same way.
By first verifying that BIDS Apps perform as intended on the small sub-sample of participants
present in the Exemplar Dataset
(that spans the full variation of the metadata),
users can confidently move forward processing the data of the complete BIDS dataset.
In the next section, we’ll introduce DataLad
and walk through a real example.
Footnotes