Vanessa

From Psych 221 Image Systems Engineering
Jump to navigation Jump to search

Derivation of Functional Networks with Independent Component Analysis

A data driven approach to differentiate functional networks for disease diagnosis

Comparing functional brain networks between a healthy and a disease population would allow for the identification of biomarkers to classify the disease, and would further contribute to our understanding of normal and aberrant brain connectivity. I am interested in data driven approaches to study the brain, and so for this project, my first aim was to compare functional networks between a clinical and normal group by using independent component analysis. My second aim was to incorporate machine learning into this project. Put together, these aims encompass a primary goal of building a classifier that can take an ICA run as input, and output a diagnosis.

Application

Attention deficit hyperactivity disorder (ADHD) is a controversial developmental disorder characterized by hyperactivity, attention problems, and impulsivity that effects 3-5% of school aged children, posing a high societal cost ($36 to $52 billion dollars annually), and increasing in incidence (a 5.5% increase annually) (1)(2). Making progress in understanding ADHD through structural and functional imaging is challenged by the range in severity and type of symptoms, and inability to make connections between functional and structural findings on a large scale. Functional deficits in ADHD have been narrowed down to problems with sustained attention, cognitive-control and inhibition, with evidence of this symptomatology lasting into adulthood. Structural deficits in ADHD include both cortical thinning and reduced volume of fronto-striatal regions, cerebellum, basal ganglia, and parietotemporal regions. Given the heterogeneous nature of this disorder and lack of data-driven methods for diagnosis, it is a prime application for this project.

Question

Are there significant differences in the default mode network (DMN) between ADHD and control?

Data

Making datasets available in the public domain is growing in popularity, and so I wanted to use data of this type. The ADHD200 dataset consists of data for normal and ADHD (combined, inattentive, and hyperactive) across 8 different sites. I decided to use a subset of this larger data set, data from NYU, which is one of the sites. My goal would then be to derive and find differences in functional networks between ADHD and control for the NYU dataset. My subject population originally included 222 individuals including healthy controls (99), ADHD combined (77), ADHD inattentive (44), and ADHD-Hyperactive (2). Due to the small sample of the hyperactive subtype, these data were not included in analysis. After quality analysis and visual inspection of data, the final group included 162 individuals (69 TD Control, 62 ADHD combined, 31 ADHD inattentive) with mean ages 12.77, 10.75, and 12.41 for further analysis. The reason that I was interested in splitting the ADHD into sub-diagnoses was because not only do we have poor understanding about what differentiates healthy control from disease, but also what differentiates disease subtypes. I thought that I could look at both of these differences for my analysis. Functional resting BOLD and an anatomical T1 were both collected on a Siemens Magneton Allegra at the New York Child Study Center (scan parameters are detailed below, and full details can be found in the supplementary section).

Image Collection Parameters


METHODS

Preprocessing

I chose to use FSL for its MELODIC program (to perform ICA) and command line utility. I looked at the source code to break apart the pre-determined GUI program, and wrote my own set of python and bash scripts to accomplish similar functionality. Anatomical data was brain extracted with FSL's BET tool at a threshold of .225. Functional data was also brain extracted with a threshold of .3, motion corrected with FSL's MCFLIRT, and smoothed at a 6mm kernel (2 times the voxel size). I chose to smooth because I am interested in larger scale patterns of brain activation across a large group size as opposed to detailed activation patterns in a small number of individuals or voxels. Functional data was bandpass filtered at a threshold of .008 to .1, and noise correction was performed with FSL's SUSAN. A linear registration was performed with FSL's FLIRT tool by registering the functional data to the MNI 152 Standard Template with the subject's native anatomical as an intermediate.

Independent Component Analysis

I am interested in data-driven approaches, so I knew from the getgo that I wanted to do independent component analysis (ICA) to break a 4D dataset (xyz coordinates over time) into spatial and temporal components. The signal in fMRI we know is a combination of different sources of variability, including noise, machine artifact, physiological signal, motion, and then finally the BOLD signal. When we analyze this data we usually have a specific hypothesis about BOLD activity at each voxel, and we might do a regression or a GLM. However in the case of resting BOLD data where there is not a task of interest to model, we can utilize ICA.

ICA is usually thought of as a more "exploratory" data analysis technique to find independently distributed spatial patterns in the data. These independent signals are linear combinations of true signals. If we think of a dataset with n voxels at p timepoints:

p x n we will call this matrix X

and we want to decompose it so that:

X = A S

where S is optimized to contain statistically independent spatial maps in rows A is the square mixing matrix, in each columns is a time-courses for the spatial map in S.

To perform this on a single resting BOLD dataset, we can think of that dataset as a 2D matrix where time is on the x axis, and space is on the Y axis. Performing ICA on an individual dataset would result in components that are orthogonal to one another and independent, with the first component accounting for the highest percentage of variance in the data. This technique will be used to derive individual functional networks, and a similar technique called multisession temporal concatenation (stacking many datasets together before performing ICA) will be used to derive group networks.

Identification of Default Mode Network

The default mode network (DMN) was manually identified from the group networks (below, left) based on visual matching to verified ventral and dorsal DMN maps produced by the Grecius lab at Stanford (below, right). The network was also verified by a postdoc in the Stanford Cognitive and Systems Neuroscience Laboratory.

Group derived DMN component (left), as compared to templates from Grecius Lab (right)

This group network was used in a template matching procedure to identify the DMN for all individual subjects. The algorithm calculates the average activation per voxel shared between the contender image and template, and subtracts the average activation per voxel not shared, resulting in a “difference score” with higher scores indicating better matches. This algorithm was carried out in a custom python script that uses the nibabel module for reading nifti files. The top matches for each subject were manually viewed to select the correct component for the default mode network.

Statistical Analysis

1 sample T-Tests for each of the three groups (healthy control, ADHD inattentive and ADHD combined) were carried out to confirm the correct selection of the default mode network. We would expect to see significant overlap between subjects in areas that are part of the network, which would be apparent in the 1 sample T-test. Two sample T-tests were carried out to compare differences between default mode networks between groups.


RESULTS

Statistical Analysis

The 1 sample T-tests confirmed that the default mode network was correctly selected for the three groups: control, combined, and inattentive (below). It should be noted that the inattentive subtype did not have as robust a network, however going back to check the selected components did not reveal any incorrectly selected networks. It is likely that the ventral and dorsal DMN were separated into separate components, and this result should be considered a limitation of the analysis.

Control, Combined, and Inattentive 1 Sample T-tests verify correct identification of DMN

Two sample T-tests to assess differences between groups only revealed sub-significant findings (uncorrected, .001) for Inattentive > Control in right putamen, superior frontal gyrus, and middle temporal gyrus, Control > Combined in middle frontal gyrus and temporal pole, Combined > Control in precuneus, and Inattentive > Combined in middle temporal gyrus and OFC. The putamen is implicated to be involved with motor learning and control predominantly driven by the neurotransmitter dopamine. Given the attentional and cognitive deficits associated with ADHD, it makes sense that we might see differences between ADHD and control focused on this region. The precuneus is involved with visospatial processing, consciousness, introspection, and episodic memory, which arguably are other functions known to be aberrant with ADHD (3)(4).

Two sample T-tests for inattentive > control, control > combined, combined > control, and inattentive > combined

CLASSIFICATION

To perform classification of disease based on the default mode network, a 1 sample T-test was done for all participants, and the resulting image saved, masked to gray matter, and binarized to represent a DMN template for the entire group. This mask was used to extract Z score values from individual DMN maps to result in an n x p matrix, with n rows of subjects and p columns of features. The data was split into a test (n = 100) and training (n=89) set with an equal distribution of ADHD and Control. The resulting SVM performed with only 44% accuracy, worse than chance.


Structure of features data matrix


The poor performance of the classifier called for further exploration as to if it is possible to separate ADHD and Control. Data was transformed with quadratic, absolute value, logarithmic, sigmoid, and exponential functions to explore differences in distributions of ADHD and Control. Further, each transformation was decomposed with singular value decomposition. Results of the transformations were used to train and test the classifier, and results were not better than chance. An example exploration of a logarithmic transformation is shown below. The plots on the far left show the different distributions between ADHD (Group 1, top) and Control (Group 2, bottom), the middle plots show the features (voxel values) for each subject (on the x axis), and the final plots are the total and first six components produced by singular value decomposition. For each visualization, the goal of the exploration would be to find a transformation that makes the data (ADHD vs Control) very different so it might be separable with an SVM.


Comparing ADHD (top) vs (control) for 1) distribution of values 2) features, and 3) svd components


While there were slight differences in the plots for the quadratic and logarithmic transformation, these new distributions used to train and test the classifier only improved performance moderately (44% accuracy for original Z scores to 46% accuracy for transformations). To further explore the possibility of the data being different enough to be separable with a SVM, correlation coefficients were calculated for each comparison between ADHD and ADHD, ADHD and Control, and Control and Control, for each of the distributions. True difference in ADHD vs. Control would be represented by a dark region (low values indicate a low correlation coefficient, which means that the vector of features between the two subjects is more different) in the square in the top right (comparing ADHD and control) as compared to the triangles showing within-group differences (top left and bottom right). As an example, the logarithmic correlation matrix is shown below, demonstrating no visual difference between ADHD and control. This pattern was true for all transformations, leading to the conclusion that these particular features are not different enough for an SVM to use to classify.


Matrix of correlation coefficients demonstrates no visual difference between ADHD vs Control (box in top right) as compared to ADHD vs ADHD (triangle top left) and Control vs Control (triangle bottom right)


Discussion and Future Work

This analysis has shown results approaching significance for the strength of connectivity of the default mode network between a set of individuals with ADHD and healthy controls. Attempts to build an elementary classifier with support vector machines based on these DMN maps were not successful, however, given the importance in the choice of features, it cannot be concluded that classification is impossible.

Statistical Z values from the thresholded component maps were used as a representation of the strength of contribution of any particular voxel to a resting state network, and it would be suggested to instead use filtered data without performing independent component analysis that is masked by the binarized DMN map (produced from the ICA analysis in the same manner). It would be interesting to attempt to incorporate spatial data, perhaps using K-Means clustering of voxel coordinates or a k-nearest neighbor approach to identify spatially similar voxels, and then using the Z score as an additional parameter to the distance metric to account for related voxels (adding 1/Z score, for example, would make voxels with a higher Z score rated as “more similar” as represented by smaller values).

The DMN was originally selected because it has been shown to be aberrant across many disorders, and because it is a relatively easy network to identify out of the many components. It is salient, however, that this same analysis done with an attention network might be more relevant to ADHD. I did not have the time to perform all of the analysis over again with a different network, however I want to note that it is something that I am going to pursue.

Lastly, singular value decomposition of different transformations of the data might be pursued further to create specific subsets of features for the classifier. The long term goal would be to develop a data-driven analysis pipeline that produces output to create many different classifiers (for different groups of disorder versus healthy control) to classify a new dataset. This long term goal fits well with the growing trend of labs and medical institutions releasing large, publicly available datasets. Such a pipeline would be used to build classifiers to identify biomarkers of disease. While one lone classifier probably is not good enough to be used to classify a new ICA run, the technique of “boosting” could be used with multiple classifiers to come to an informed diagnosis. This technique can be thought of as taking a “majority vote.” Further extension of this method would be to extract relevant features from these maps to build a database with supporting infrastructure to query semantically accessible results, however this is more of a problem for bioinformatics than neuroscience.

Appendix

Presentation

Additional Formats

Script

REFERENCES

 (1) Cherkasova, M. V., & Hechtman, L. (2009). Neuroimaging in attention-deficit hyperactivity disorder: beyond the frontostriatal circuitry. Canadian journal of psychiatry. Revue canadienne de psychiatrie, 54(10), 651-64.

(2) Cubillo, A., & Rubia, K. (2010). Structural and functional brain imaging in adult attention-deficit/hyperactivity disorder. Expert review of neurotherapeutics, 10(4), 603-20. doi:10.1586/ern.10.4

(3) Kelly, a M. C., Margulies, D. S., & Castellanos, F. X. (2007). Recent advances in structural and functional brain imaging studies of attention-deficit/hyperactivity disorder. Current psychiatry reports, 9(5), 401-7.

(4) Seidman, L. J., Valera, E. M., & Makris, N. (2005). Structural brain imaging of attention-deficit/hyperactivity disorder. Biological psychiatry, 57(11), 1263-72. doi:10.1016/j.biopsych.2004.11.019