Metadata Image Parameter Proof of Concept

In this proof of concept we will read & write JSON files in Jupyter notebook.

  1. display the data in the sidecar

  2. edit this data

  3. check that the sidecar will write valid JSON files.

[108]:
#import json module to be able to read & write json files
import json
import pandas as pd
from pandas.io.json import json_normalize
from glob import glob
from pathlib import Path
  1. The first part will include displaying the data in the sidecar by reading the JSON files

  2. We then use json.load to turn it into a python object

  3. The data we have includes an array of information under SliceTiming so we will create a dataframe within our dataframe to include SliceTiming as SliceTime 00, 01 , etc. (individual values of SliceTiming).

[180]:
#testing the code with a single json file.

file_test = open('/Users/bjaber/Projects/CuBIDS-use_cases/cubids/testdata/complete/sub-01/ses-phdiff/dwi/sub-01_ses-phdiff_acq-HASC55AP_dwi.json')
sample_data = json.load(file_test)
sample_data.keys()
sample_data.get('SliceTiming')
SliceTime = sample_data.get('SliceTiming') #the way you can snatch things out of a dictionary
#if dict doesn't have the key it will return none vs. error

if SliceTime:
    sample_data.update({"SliceTime%03d"%SliceNum : time for SliceNum, time in enumerate(SliceTime)})
    del sample_data['SliceTiming']

array_data = pd.DataFrame.from_dict(sample_data, orient='index', columns = ['1'])
array_data
[180]:
1
ProcedureStepDescription MR_HEAD_WO_IV_CONTRAST
DeviceSerialNumber 167024
EffectiveEchoSpacing 0.000689998
TotalReadoutTime 0.0717598
ManufacturersModelName Prisma_fit
... ...
SliceTime031 3.61667
SliceTime032 3.73333
SliceTime033 3.85
SliceTime034 3.96667
SliceTime035 4.08333

64 rows × 1 columns

[156]:
#{"SliceTime%03d"%SliceNum : time for SliceNum, time in enumerate(SliceTime)}

the next one might not have slice timing but you concatenate the next row – if the file doesn’t have slice timing it fills with NaN and if it doesn’t then google!

rglob to get all the files in the bids tree then load it with json.load

Next steps

  1. Slice Timing turn it into a column where each column would have its own float

  2. multiple columns with the umber of them filled out to the maximum number of slice times

  1. The following part is used to edit JSON file data.

In order to do so, call to the JSON object that was created using the json.load commeand, in this case json_data, and refer to the value that you want to change and edit it.

Note that this code is commented out as it will be different when we are using this with Pandas DataFrame. This was code written when working with a single .json file.

[36]:
#Here we change the value for AcquisionNumber from 1 to 2.
#json_data["AcquisitionNumber"] = 2
[37]:
#Uncomment below to view edited data
#json_data
[38]:
#Reverting back to original data
#json_data["AcquisitionNumber"] = 1
  1. Checking that the sidecare will write valid JSON files

In order to do this, we use the json.dumps function as it will turn the python object into a JSON string, and therefore, will write a valid JSON file always.

Note: same as the previous chunk of code, this was written for a single .json file and therefore is commentend out

[19]:
#json_string = json.dumps(json_data)
[38]:
#Uncomment below to view the python object as a JSON string
#json_string
[158]:
#notes from Matt

# have a function that does the reading and creates 1 row then you have to loop and the dataframe grows through concatanation
# pandas.concat

The next section is the for loop attempting to extract, open and turn into a dataframe each json file in the “complete” directory!

[205]:
IMAGING_PARAMS = set(["ParallelReductionFactorInPlane", "ParallelAcquisitionTechnique",
    "ParallelAcquisitionTechnique", "PartialFourier", "PhaseEncodingDirection",
    "EffectiveEchoSpacing", "TotalReadoutTime", "EchoTime", "SliceEncodingDirection",
    "DwellTime", "FlipAngle", "MultibandAccelerationFactor", "RepetitionTime",
    "VolumeTiming", "NumberOfVolumesDiscardedByScanner", "NumberOfVolumesDiscardedByUser"])

dfs = [] # an empty list to store the data frames

counter=0

for path in Path('/Users/bjaber/Projects/CuBIDS/cubids/testdata/complete').rglob('*.json'):
    print(type(path))
    print(counter)
    s_path = str(path)
    #print(s_path)
    file_tree = open(s_path)
    example_data = json.load(file_tree)
    wanted_keys = example_data.keys() & IMAGING_PARAMS
    example_data = {key: example_data[key] for key in wanted_keys}
    SliceTime = example_data.get('SliceTiming') #the way you can snatch things out of a dictionary #if dict doesn't have the key it will return none vs. error
    if SliceTime:
        example_data.update({"SliceTime%03d"%SliceNum : [time] for SliceNum, time in enumerate(SliceTime)})
        del example_data['SliceTiming']
    #if ShimSetting:

    dfs.append(example_data)

df = pd.DataFrame(dfs)
#df.drop_duplicates()
df.head()



#create dataframe of unique rows
#bids entities filter in the cubids class to filter through the files
#loop over , get metadata, and put into the dataframe



        #print(example_data)



#for file in example_data:
    #data = pd.DataFrame.from_dict(example_data, orient='index') # read data frame from json file
    #dfs.append(data) # append the data frame to the list
    #temp = pd.concat(dfs, ignore_index=True) # concatenate all the data frames in the list.

    #data = pd.DataFrame.from_dict(example_data, orient='index')
    #data
    #counter += 1


#NOTE: error when trying to put the data into a pandas dataframe. This error happens regardless of the way SliceTiming is setup.
# print(example_data) was used to make sure that inputs that are an array such as in the field SliceTiming are being separated into indenpendent values of SliceTime00x that should feed into the dataframe.
# it is doing that across all json files that are being loaded from the directory
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
<class 'pathlib.PosixPath'>
0
[205]:
EchoTime EffectiveEchoSpacing TotalReadoutTime FlipAngle RepetitionTime PhaseEncodingDirection PartialFourier
0 NaN NaN NaN NaN NaN NaN NaN
1 0.08900 0.00069 0.07176 90.0 4.2 j NaN
2 NaN NaN NaN 60.0 1.5 j- 0.75
3 0.00646 NaN NaN 60.0 1.5 j- 0.75
4 0.08900 0.00069 0.07176 90.0 4.2 j- NaN

These are just documented attempts at the above for loop!

attempt at directory stuff #1

import os, json import pandas as pd

path_to_json = ‘/Users/bjaber/Projects/CuBIDS-use_cases/cubids/testdata/complete/sub-01/ses-phdiff/anat’ json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith(‘.json’)] print(json_files)

attempt #2

for filename in glob(‘/Users/bjaber/Projects/CuBIDS-use_cases/cubids/testdata/complete/*.json’): print(filename)

attempt # 3

for name in files: f = open(name, ‘r’) print(f) content=f.readlines() print(f’Content of %s::nbsphinx-math:`n `%s’ %(name,content)) f.close()

[ ]: