General Usage Guidelines

Before we implement a CuBIDS workflow, let’s define the terminology and take a look at some of the commands available in the software.

More definitions

Key Group

A Key Group is a unique set of BIDS key-value pairs, excluding identifiers such as subject and session. For example, the files:

bids-root/sub-1/ses-1/func/sub-1_ses-1_acq-mb_dir-PA_task-rest_bold.nii.gz
bids-root/sub-1/ses-2/func/sub-1_ses-2_acq-mb_dir_PA_task-rest_bold.nii.gz
bids-root/sub-2/ses-1/func/sub-2_ses-1_acq-mb_dir-PA_task-rest_bold.nii.gz

Would all share the same Key Group. If these scans were all acquired as a part of the same study on the same scanner with exactly the same acquisition parameters, this naming convention would suffice.

However, in large multi-scanner, multi-site, or longitudinal studies where acquisition parameters change over time, it’s possible that the same Key Group could contain scans that differ in important ways.

CuBIDS examines all acquisitions within a Key Group to see if there are any images that differ in a set of important acquisition parameters. The subsets of consistent acquisition parameter sets within a Key Group are called a Parameter Group.

Parameter Group

A Parameter Group is a subset of a Key Group that contains images with the same acquisition parameters.

Even though two images may belong to the same Key Group and are valid BIDS, they may have images with different acquisition parameters. There is nothing fundamentally wrong with this — the bids-validator will often simply flag these differences with a Warning, but not necessarily suggest changes. That being said, there can be detrimental consequences downstream if the different parameters cause the same preprocessing pipelines to configure differently to images of the same Key Group.

Acquisition Group

We define an Acquisition Group as a collection of sessions across participants that contain the exact same set of Key and Parameter Groups. Since Key Groups are based on the BIDS filenames— and therefore both MRI image type and acquisition specific— each BIDS session directory contains images that belong to a set of Parameter Groups. CuBIDS assigns each session, or set of Parameter Groups, to an Acquisition Group such that all sessions in an Acquisition Group possesses an identical set of scan acquisitions and metadata parameters across all image modalities present in the dataset. We find Acquisition Groups to be a particularly useful categorization of BIDS data, as they identify homogeneous sets of sessions (not individual scans) in a large dataset. They are also useful for expediting the testing of pipelines; if a BIDS App runs successfully on a single subject from each Acquisition Group, one can be confident that it will handle all combinations of scanning parameters in the entire dataset.

The Acquisition Groups that subjects belong to are listed in _AcqGrouping.csv, while the Key Groups and Parameter Groups that define each Acquisition Group are noted in _AcqGroupingInfo.txt.

The _summary.tsv File

This file contains all the detected Key Groups and Parameter Groups. It provides an opportunity to evaluate your data and decide how to handle heterogeneity.

Below is an example _summary.tsv of the run-1 DWI Key Group in the PNC [1]. This reflects the original data that has been converted to BIDS using a heuristic. It is similar to what you will see when you first use this functionality:

Pre Apply Groupings

Notes

ManualCheck

MergeInto

RenameKeyGroup

KeyParamGroup

KeyGroup

ParamGroup

Counts

Dim1Size

Dim2Size

Dim3Size

EchoTime

EffectiveEchoSpacing

FlipAngle

HasFieldmap

KeyGroupCount

Modality

NSliceTimes

NumVolumes

Obliquity

ParallelReductionFactorInPlane

PartialFourier

PhaseEncodingDirection

RepetitionTime

TotalReadoutTime

UsedAsFieldmap

VoxelSizeDim1

VoxelSizeDim2

VoxelSizeDim3

acquisition-VARIANTNoFmap_datatype-dwi_run-1_suffix-dwi

datatype-dwi_run-1_suffix-dwi__2

datatype-dwi_run-1_suffix-dwi

2

25

128

128

70

0.082

0.000267

90

FALSE

1426

dwi

70

35.0

FALSE

3.0

0.75

j-

8.1

0.034

FALSE

1.875

1.875

2.0

acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi

datatype-dwi_run-1_suffix-dwi__3

datatype-dwi_run-1_suffix-dwi

3

6

128

128

70

0.082

0.000267

90

TRUE

1426

dwi

70

35.0

FALSE

3.0

0.75

j-

9.0

0.034

FALSE

1.875

1.875

2.0

acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi

datatype-dwi_run-1_suffix-dwi__4

datatype-dwi_run-1_suffix-dwi

4

3

128

128

70

0.082

0.000267

90

TRUE

1426

dwi

70

35.0

FALSE

3.0

0.75

j-

9.8

0.034

FALSE

1.875

1.875

2.0

acquisition-VARIANTDim3SizeVoxelSizeDim3_datatype-dwi_run-1_suffix-dwi

datatype-dwi_run-1_suffix-dwi__5

datatype-dwi_run-1_suffix-dwi

5

2

128

128

46

0.082

0.000267

90

TRUE

1426

dwi

46

35.0

FALSE

3.0

0.75

j-

8.1

0.034

FALSE

1.875

1.875

3.0

acquisition-VARIANTEchoTimeEffectiveEchoSpacingRepetitionTimeTotalReadoutTime_datatype-dwi_run-1_suffix-dwi

datatype-dwi_run-1_suffix-dwi__6

datatype-dwi_run-1_suffix-dwi

6

1

128

128

70

0.102

0.0008

90

TRUE

1426

dwi

70

35.0

FALSE

3.0

0.75

j-

12.3

0.102

FALSE

1.875

1.875

2.0

acquisition-VARIANTObliquity_datatype-dwi_run-1_suffix-dwi

datatype-dwi_run-1_suffix-dwi__7

datatype-dwi_run-1_suffix-dwi

7

1

128

128

70

0.082

0.000267

90

TRUE

1426

dwi

70

35.0

TRUE

3.0

0.75

j-

8.1

0.034

FALSE

1.875

1.875

2.0

The _files.tsv file

This file contains one row per imaging file in the BIDS directory. You won’t need to edit this file directly, but it keeps track of every file’s assignment to Key and Parameter Groups.

Modifying Key and Parameter Group Assignments

Sometimes we see that there are important differences in acquisition parameters within a Key Group. If these differences impact how a pipeline will process the data, it makes sense to assign the scans in that Parameter Group to a different Key Group (i.e., assign them a different BIDS name). This can be accomplished by editing the empty columns in the _summary.csv file produced by cubids group.

Once the columns have been edited you can apply the changes to BIDS data using

$ cubids apply /bids/dir keyparam_edited new_keyparam_prefix

The changes in keyparam_edited_summary.csv will be applied to the BIDS data in /bids/dir and the new Key and Parameter groups will be saved to csv files starting with new_keyparam_prefix. Note: fieldmaps keygroups with variant parameters will be identified but not renamed.

The _AcqGrouping.tsv file

The _AcqGrouping.tsv file organizes the dataset by session and tags each one with its Acquisition Group number.

The _AcqGroupInfo.txt file

The _AcqGroupInfo.txt file lists all Key Groups that belong to a given Acquisition Group along with the number of sessions each group possesses.

Visualizing and summarizing metadata heterogeneity

Use cubids group to generate your dataset’s Key Groups and Parameter Groups:

$ cubids group FULL/PATH/TO/BIDS/DIR FULL/PATH/TO/v0

This will output four files, including the summary and files tsvs described above, prefixed by the second argument v0.

Applying changes

The cubids apply program provides an easy way for users to manipulate their datasets. Specifically, cubids apply can rename files according to the users’ specification in a tracked and organized way. Here, the summary.tsv functions as an interface modifications; users can mark Parameter Groups they want to rename (or delete) in a dedicated column of the summary.tsv and pass that edited tsv as an argument to cubids apply.

Detecting Variant Groups

Additionally, cubids apply can automatically rename files in Variant Groups based on their scanning parameters that vary from those in their Key Groups’ Dominant Parameter Groups. Renaming is automatically suggested when the summary.tsv is generated from a cubids group run, with the suggested new name listed in the tsv’s Rename Key Group column. CuBIDS populates this column for all Variant Groups (e.g., every Parameter Group except the Dominant one). Specifically, CuBIDS will suggest renaming all non-dominant Parameter Group to include VARIANT* in their acquisition field where * is the reason the Parameter Group varies from the Dominant Group. For example, when CuBIDS encounters a Parameter Group with a repetition time that varies from the one present in the Dominant Group, it will automatically suggest renaming all scans in that Variant Group to include acquisition-VARIANTRepetitionTime in their filenames. When the user runs cubids apply, filenames will get renamed according to the auto-generated names in the “Rename Key Group” column in the summary.tsv

Deleting a mistake

To remove files in a Parameter Group from your BIDS data, you simply set the MergeInto value to 0. We see in our data that there is a strange scan that has a RepetitionTime of 12.3 seconds and is also variant with respect to EffectiveEchoSpacing and EchoTime. We elect to remove this scan from our dataset because we do not want these parameters to affect our analyses. To remove these files from your BIDS data, add a 0 to MergeInto and save the new tsv as v0_edited_summary.tsv

Pre Apply Groupings with Deletion Requested

Notes

ManualCheck

MergeInto

RenameKeyGroup

KeyParamGroup

KeyGroup

ParamGroup

Counts

Dim1Size

Dim2Size

Dim3Size

EchoTime

EffectiveEchoSpacing

FlipAngle

HasFieldmap

KeyGroupCount

Modality

NSliceTimes

NumVolumes

Obliquity

ParallelReductionFactorInPlane

PartialFourier

PhaseEncodingDirection

RepetitionTime

TotalReadoutTime

UsedAsFieldmap

VoxelSizeDim1

VoxelSizeDim2

VoxelSizeDim3

acquisition-VARIANTNoFmap_datatype-dwi_run-1_suffix-dwi

datatype-dwi_run-1_suffix-dwi__2

datatype-dwi_run-1_suffix-dwi

2

25

128

128

70

0.082

0.000267

90

FALSE

1426

dwi

70

35.0

FALSE

3.0

0.75

j-

8.1

0.034

FALSE

1.875

1.875

2.0

acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi

datatype-dwi_run-1_suffix-dwi__3

datatype-dwi_run-1_suffix-dwi

3

6

128

128

70

0.082

0.000267

90

TRUE

1426

dwi

70

35.0

FALSE

3.0

0.75

j-

9.0

0.034

FALSE

1.875

1.875

2.0

acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi

datatype-dwi_run-1_suffix-dwi__4

datatype-dwi_run-1_suffix-dwi

4

3

128

128

70

0.082

0.000267

90

TRUE

1426

dwi

70

35.0

FALSE

3.0

0.75

j-

9.8

0.034

FALSE

1.875

1.875

2.0

acquisition-VARIANTDim3SizeVoxelSizeDim3_datatype-dwi_run-1_suffix-dwi

datatype-dwi_run-1_suffix-dwi__5

datatype-dwi_run-1_suffix-dwi

5

2

128

128

46

0.082

0.000267

90

TRUE

1426

dwi

46

35.0

FALSE

3.0

0.75

j-

8.1

0.034

FALSE

1.875

1.875

3.0

0

acquisition-VARIANTEchoTimeEffectiveEchoSpacingRepetitionTimeTotalReadoutTime_datatype-dwi_run-1_suffix-dwi

datatype-dwi_run-1_suffix-dwi__6

datatype-dwi_run-1_suffix-dwi

6

1

128

128

70

0.102

0.0008

90

TRUE

1426

dwi

70

35.0

FALSE

3.0

0.75

j-

12.3

0.102

FALSE

1.875

1.875

2.0

acquisition-VARIANTObliquity_datatype-dwi_run-1_suffix-dwi

datatype-dwi_run-1_suffix-dwi__7

datatype-dwi_run-1_suffix-dwi

7

1

128

128

70

0.082

0.000267

90

TRUE

1426

dwi

70

35.0

TRUE

3.0

0.75

j-

8.1

0.034

FALSE

1.875

1.875

2.0

In this example, users can apply the changes to BIDS data using the following command:

$ cubids apply FULL/PATH/TO/BIDS/DIR FULL/PATH/TO/v0_edited_summary.tsv FULL/PATH/TO/v0_files.tsv FULL/PATH/TO/v1

The changes in v0_edited_summary.tsv will be applied to the BIDS data and the new Key and Parameter Groups will be saved to tsv files starting with v1.

Applying these changes we would see:

Post Apply Groupings

Notes

ManualCheck

MergeInto

RenameKeyGroup

KeyParamGroup

KeyGroup

ParamGroup

Counts

Dim1Size

Dim2Size

Dim3Size

EchoTime

EffectiveEchoSpacing

FlipAngle

HasFieldmap

KeyGroupCount

Modality

NSliceTimes

NumVolumes

Obliquity

ParallelReductionFactorInPlane

PartialFourier

PhaseEncodingDirection

RepetitionTime

TotalReadoutTime

UsedAsFieldmap

VoxelSizeDim1

VoxelSizeDim2

VoxelSizeDim3

datatype-dwi_run-1_suffix-dwi__1

datatype-dwi_run-1_suffix-dwi

1

1388

128

128

70

0.082

0.000267

90

TRUE

1388

dwi

70

35.0

FALSE

3.0

0.75

j-

8.1

0.034

FALSE

1.875

1.875

2.0

acquisition-VARIANTNoFmap_datatype-dwi_run-1_suffix-dwi__1

acquisition-VARIANTNoFmap_datatype-dwi_run-1_suffix-dwi

1

25

128

128

70

0.082

0.000267

90

FALSE

25

dwi

70

35.0

FALSE

3.0

0.75

j-

8.1

0.034

FALSE

1.875

1.875

2.0

acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi__1

acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi

1

6

128

128

70

0.082

0.000267

90

TRUE

9

dwi

70

35.0

FALSE

3.0

0.75

j-

9.0

0.034

FALSE

1.875

1.875

2.0

acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi__2

acquisition-VARIANTRepetitionTime_datatype-dwi_run-1_suffix-dwi

2

3

128

128

70

0.082

0.000267

90

TRUE

9

dwi

70

35.0

FALSE

3.0

0.75

j-

9.8

0.034

FALSE

1.875

1.875

2.0

acquisition-VARIANTDim3SizeVoxelSizeDim3_datatype-dwi_run-1_suffix-dwi__1

acquisition-VARIANTDim3SizeVoxelSizeDim3_datatype-dwi_run-1_suffix-dwi

1

2

128

128

46

0.082

0.000267

90

TRUE

2

dwi

46

35.0

FALSE

3.0

0.75

j-

8.1

0.034

FALSE

1.875

1.875

3.0

acquisition-VARIANTEchoTimeEffectiveEchoSpacingRepetitionTimeTotalReadoutTime_datatype-dwi_run-1_suffix-dwi__1

acquisition-VARIANTEchoTimeEffectiveEchoSpacingRepetitionTimeTotalReadoutTime_datatype-dwi_run-1_suffix-dwi

1

1

128

128

70

0.102

0.0008

90

TRUE

1

dwi

70

35.0

FALSE

3.0

0.75

j-

12.3

0.102

FALSE

1.875

1.875

2.0

acquisition-VARIANTObliquity_datatype-dwi_run-1_suffix-dwi__1

acquisition-VARIANTObliquity_datatype-dwi_run-1_suffix-dwi

1

1

128

128

70

0.082

0.000267

90

TRUE

1

dwi

70

35.0

TRUE

3.0

0.75

j-

8.1

0.034

FALSE

1.875

1.875

2.0

Customizable configuration

CuBIDS also features an optional, customizable, MRI image type-specific configuration file. This file can be passed as an argument to cubids group and cubids apply using the --config flag and allows users to customize grouping settings based on MRI image type and parameter. Each Key Group is associated with one (and only one) MRI image type, as BIDS filenames include MRI image type-specific values as their suffixes.

This easy-to-modify configuration file provides several benefits to curation. First, it allows users to add and remove metadata parameters from the set that determines groupings. This can be very useful if a user deems a specific metadata parameter irrelevant and wishes to collapse variation based on that parameter into a single Parameter Group. Second, the configuration file allows users to apply tolerances for parameters with numerical values. This functionality allows users to avoid very small differences in scanning parameters (i.e., a TR of 3.0s vs 3.0001s) being split into different Parameter Groups. Third, the configuration file allows users to determine which scanning parameters are listed in the acquisition field when auto-renaming is applied to Variant Groups.

Exemplar testing

In addition to facilitating curation of large, heterogeneous BIDS datasets, CuBIDS also prepares datasets for testing BIDS Apps. This portion of the CuBIDS workflow relies on the concept of the Acquisition Group: a set of sessions that have identical scan types and metadata across all imaging modalities present in the session set. Specifically, cubids copy-exemplars copies one subject from each Acquisition Group into a separate directory, which we call an Exemplar Dataset. Since the Exemplar Dataset contains one randomly selected subject from each unique Acquisition Group in the dataset, it will be a valid BIDS dataset that spans the entire metadata parameter space of the full study. If users run cubids copy-exemplars with the --use-datalad flag, the program will ensure that the Exemplar Dataset is tracked and saved in DataLad. If the user chooses to forgo this flag, the Exemplar Dataset will be a standard directory located on the filesystem. Once the Exemplar Dataset has been created, a user can test it with a BIDS App (e.g., fMRIPrep or QSIPrep) to ensure that each unique set of scanning parameters will pass through the pipelines successfully. Because BIDS Apps auto-configure workflows based on the metadata encountered, they will process all scans in each Acquisition Group in the same way. By first verifying that BIDS Apps perform as intended on the small sub-sample of participants present in the Exemplar Dataset (that spans the full variation of the metadata), users can confidently move forward processing the data of the complete BIDS dataset.

In the next section, we’ll introduce DataLad and walk through a real example.

Footnotes