VARIATION IN SIGNIFICANT CLUSTER NUMBER AND DISTRIBUTION DUE TO PARAMETER SELECTION DURING CORRECTION FOR MULTIPLE COMPARISONS USING THE MONTE CARLO METHOD IN FUNCTIONAL MAGNETIC RESONANCE IMAGING DATA

Background

When conducting statistical analysis it is important to be aware of the problem of multiple comparisons. The problem of multiple comparisons arises when more than one statistical test is considered simultaneously. When more than one statistical test is conducted, there is an increased likelihood that any single test will reach significance for rejection of the null hypothesis compared to if a single test were ran. Importantly, criterion for null hypothesis rejection can be altered to help accommodate for multiple statistical tests. One example of multiple comparison correction is the Bonferroni method, which requires the single-test p-value rejection threshold to be divided by the number of tests conducted. For datasets in which many statistical tests are necessary to make inferences, methods like the Bonferroni correction may be too severe, resulting in strongly reduce power.

The problem of multiple comparisons often occurs in statistical analysis of brain imaging data, where analysis can often be conducted on data from more than one channel, voxel, dipole source, temporal bin or spatial location. For example, in functional magnetic resonance imaging (fMRI), there are often more than 10,000 voxels from which data is statistically analyzed. If 10,000 signals were created using a random number generator, simply by chance alone many signals would likely appear to be significant, that is, if not corrected for multiple comparisons. Many methods have been developed to deem whether a particular fMRI signal is significant. One such method uses Monte Carlo simulations.

The Monte Carlo method includes four major steps, first a domain needs to be designated (a space for values), second values need to be assigned to the spaces identified in the first step in a random manner, third and lastly, a pre-specified calculation needs to be ran on these values to assess probability of outcomes. The values should be truly random and many. In fMRI applications, Monte Carlo simulations can be used to assess statistical power of a cluster of voxels identified at a given voxel-wise p-threshold. Such a Monte Carlo simulation may use a volume of voxels as the domain, a random number generator for input assignment (n(0,1) per voxel), and a calculation of the size of clusters at given threshold to quantify probabilities of observed outcomes. Such a Monte Carlo simulation can be repeated multiple times to achieve increasingly precise and stabile estimates, and the resulting cluster sizes can be averaged across iterations.

In the current study, I used a previously collected dataset (detailed below) to explore the effects of parameter selection (voxel-wise p-threshold) in Monte Carlo simulation on the characteristics of significant voxel clusters. The variation in cluster size and distribution is subsequently discussed. I hypothesize that at more stringent voxel-wise p-thresholds the clusters will be smaller and more statistically significant on average, per voxel.

Methods

Overview of dataset

Subjects

Nine adults diagnosed with major depressive disorder (MDD). Exclusion criteria included: history of brain injury, primary psychotic ideation, panic disorder, mania, social phobia, post-traumatic stress disorder, substance abuse in past six months, generalized anxiety disorder, left-handedness, hypertension, neurological illness, metal implants, skull fracture history, head trauma with loss of consciousness greater than 5 minutes. Six of the nine subjects were on antidepressants. Depressive symptom severity was assessed using the Beck Depression Inventory (BDI-II; Beck et al. 1979) at the time of scanning and on average 18.7 months later. BDI change scores (time 2 – time 1) were used in voxel-wise correlation analysis with neural activations (described below).

Task

During fMRI scanning, subjects viewed images from the International Affective Picture System (IAPS; Lang & Greenwald 1993) rated negative, neutral, and positive. Images were presented for 2 seconds, after which subjects were to rate the image and fixate. One trial was 14 seconds long (12 seconds rating/fixating). 210 images were viewed, 70 each for negative, neutral and positive. These images were presented in a random order for each subject across five 588-second runs.

One week after MRI data collection, subjects completed an incidental recognition memory task. The task required viewing and rating of 420 IAPS images, 210 of which had been seen during scanning (210 which had not, i.e. foils). Subjects indicated if the image had been previously unseen, was merely familiar or remembered.

MR acquisition

A 1.5T General Electric Signa MR scanner was used to collect functional whole brain data. Scout scans were used to localize brain, with two high-order shims applied thereafter. A single-channel whole head coil was used to acquire blood-oxygen level-dependent (BOLD) data with a spiral pulse sequence (Glover and Law 2001) [24 axial slices, repetition time (TR) = 83 ms/slice, echo time (TE) 40 ms, flip angle = 70°, field of view (FOV) = 24 cm, acquisition time = 2000 ms per frame, number of frames = 299 per run]. Slices had 3.75mm2 in-plane and 4mm through-plane resolution (1mm between-slice distance). A structural scan was then collected for co-registration of the functional data (115 slices, 1 mm2 in-plane and 1.5 mm through-plane resolution, TE = min, flip angle = 15°, FOV = 22cm). A bite-bar formed for each subject was used for head-movement minimization.

MR analysis

Analysis was conducted using the Analysis of Functional NeuroImages (AFNI; afni.nimh.nih.gov) software suite.

Pre-processing and fMRI analysis

Using a Fourier interpolation algorithm BOLD functional images were motion corrected. A despiking algorithm was used to further correct images with 1-3mm movements by replacing data from individual high motion acquisitions with outlier insensitive images. Functional data was then spatially smooth using a full width at half maximum of 4mm Gaussian kernel. Data was high-pass filtered with frequency criterion of one cycle per minute, and subsequently converted to percent signal change. Data were then warped to the Talairach template volume (Talairach & Tournoux 1988).

Response differences for subsequently remembered negative versus subsequently remembered neutral images were calculated as follows. Delta functions were separately created for each subject using ratings from the recognition task. Stimuli events that were rated as being remembered were convolved with a gamma function for creating valence and memory covariates for fitting the signal timecourses at each voxel. AFNI’s 3dDeconvolve was used for least-squares data-fitting on these covariates (after accounting for nuisance covariates). Contrast coefficients from negative versus neutral conditions were obtained using 3dDeconvolve for each participant, resulting in voxel-wise estimates of responses to negative compared to neutral remembered stimuli.

Relating depressive symptomatology to BOLD response

After a pRF model was solved for each subject, the model was trasnformed into MNI template space. This was done by first aligning the high resolution t1-weighted anatomical scan from each subject to an MNI template. Since the pRF model was coregistered to the t1-anatomical scan, the same alignment matrix could then be applied to the pRF model.
Once each pRF model was aligned to MNI space, 4 model parameters - x, y, sigma, and r^2 - were averaged across each of the 6 subjects in each voxel.

Statistical methods

Monte Carlo simulations were conducted using AFNI’s AlphaSim (See Fig. 1 for a schematic of the steps of AlphaSim).

Step 1: AlphaSim fills a nx-by-ny-by-nz image with independent random numbers between 0 and 1. The dimensions of this image are determined by the native space of the fMRI data being assessed (in the current study 54x64x50).

Step 2: A Gaussian filter then applied to the spatially uncorrelated map to simulate voxel correlation. More specifically, AlphaSim accomplishes this filtering by taking the 3D fast Fourier transform (FFT) of the random image, multiplying this by the transform of the Gaussian function, and finally applying inverse FFT to the result. This works because convolution in the space domain is computationally identical to multiplication in the frequency domain. The filter’s parameters should be set to match those used in preprocessing of the test fMRI data, in the current study 4mm FWHM.

Step 3: The resulting image is scaled and thresholded based on the p-threshold parameter values, resulting in a binary image. Scaling is achieved to identify the voxel-wise probability threshold, pthr. From the samples mean and standard deviation, zthr can be identified, such that pthr x nx x ny x nz of the voxels have intensity larger than zthr. Now all the voxels greater than zthr are set to ‘1’, and all voxels less than zthr are set to ‘0’, resulting in a binary image.

Step 4: The binary image of the entire voxel space is then masked with the region of interest. In the current study, a whole brain mask was used. Smaller anatomically or functionally defined volumes can also be used. In theory, reducing the number of voxels from which clusters can be identified from should reduce the cluster size required to reach significance. Step 5: Clusters are then identified from the masked binary image. Clusters are composed of non-zero voxels with adjacent faces. At a given voxel-wise p-threshold, each (activated or non-zero) voxel can only be a member of 1 cluster. All clusters are then identified and recorded in a frequency table.

Steps 1 through 5 are then repeated and the frequency table is updated. Repeating the process allows for increased precision and stability of cluster size estimates. To assess significance, alpha values are then calculated by identifying what percentage of the images have clusters of a given size. The current study explored cluster sizes, frequency and location at voxel-wise p-thresholds of 0.10, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, and 0.001. 10000 Monte Carlo simulations were initiated.

Results

Of 172,800 voxels, 66,123 voxels (38.27%) were contained in the mask, and subsequently included in cluster identification. See Table 1 for significant clusters sizes at alpha = 0.05, cluster sizes rounded up to the nearest integer. Voxel-wise p-threshold of 0.10, two clusters were identified, one at posterior cingulate cortex (PCC), the other in medial frontal cortex (mPFC; see Figure 2). The clusters identified at voxel-wise p-threshold 0.05 and 0.04 were also in posterior cingulate cortex cluster. At voxel-wise p-threshold of 0.03 a PCC and inferior frontal gyrus (IFG) region were identified. At voxel-wise p-threshold of 0.02, PCC, IFG and subgenual prefrontal cortical region (sgPFC) regions were identified (see Figure 3). At voxel-wise p-threshold of 0.01, sgPFC and PCC were identified. At voxel-wise p-threshold of 0.005, no clusters were identified, and at 0.0001, one cluster was identified in sgPFC. See Table 2 for characteristics of identified clusters. Clusters were found in a total of four brain regions, PCC, MPFC, IFG, and sgPFC (summary of findings in Table 3). The average t-value (4.316 per cluster) was largest for the sgPFC region (7.420) identified at voxel-wise p-thresh of 0.001, and smallest for MPFC region (2.819) at voxel-wise p-treshold of 0.10 (see Table 2). Of note, as the voxel-wise p-threshold became more conservative, the number of voxels in a cluster needed for significance decreased.

Below is another example of a reinotopic map in a different subject.

File:Example2.jpg

Table 2

Discussion

The findings of the current study indicate that parameter specification during correction of multiple comparisons using Monte Carlo simulations in fMRI data can lead to disparate outcomes. More specifically, varying the voxel-wise p-threshold resulted in differences in the numbers of voxels needed per cluster for significance and the distribution of significant clusters across the brain.

The size of a given voxel cluster to reach significance varied from as large as 173 voxels at a voxel-wise p-threshold of 0.10 and as small as 6 voxels at a voxel-wise p-threshold of 0.001. Less stringent voxel-wise p-thresholds (e.g. 0.10) therefore lead may to the identification of regions that are comparatively large, and not as statistically compelling per voxel compared to regions identified at more stringent voxel-wise p-threshold, which may be smaller and more statistically compelling per voxel. This effect can be seen in the average voxel-wise t-value in Table 2, as the largest the largest cluster identified had an average voxel-wise t-value of 2.935 (PCC at 0.10 voxel-wise p-threshold) compared to 7.420 (sgPFC at 0.001 voxel-wise p-threshold). This suggests that conservative voxel-wise p-thresholds will preferentially select regions that are comparatively small, with high average voxel-wise t-values, while more liberal voxel-wise p-thresholds will select for larger clusters with reduced average voxel-wise t-values.

Altering the voxel-wise p-threshold from values ranging from 0.10 to 0.001 resulted in one region that appeared only with the most liberal threshold tested (MPFC), another that appeared in all but the most conservative thresholds tested (PCC), a region that appeared only with the most conservative thresholds tested (sgPFC), and lastly a region that appeared only at intermediate thresholds tested (IFG). The variation in distribution of significant clusters could lead to different interpretations of the data. For instance, an interpretation of the dataset above based on MPFC may lead to a discussion of the influence of self-conception as a result of the analysis at a voxel-wise p-threshold of 0.10, whereas an interpretation based on sgPFC may focus on cognitive regulation.

Parameter specification can lead to disparate outcomes during correction for multiple comparisons using the Monte Carlo simulations in fMRI data. Depending on the goals of the given study, this characteristic of parameter selection may be optimized based on the a-priori regions of interest. For example, in a study of amygdala functionality, a more stringent voxel-wise p-threshold may be more appropriate than if cingulate cortex or another more distributed brain region were of interest. In effect, parameter selection should be carefully considered when using the Monte Carlo method for correction of multiple comparisons in fMRI data, depending on the a priori region(s) of interest.

References - Resources and related work

Beck AT, Rush AJ, Shaw BF, Emery G (1979) Cognitive Therapy of Depression. New York: The Guilford Press.

Glover GH, Law CS (2001) Spiral-in/out BOLD fMRI for increased SNR and reduced susceptibility artifacts. Magn Reson Med 46:515-522.

Lang PJ, Greenwald MK (1993) International affective picture system standardization procedure and results for affective judgments: Technical repots 1A-1C. Gainesville, Florida: Center for Research in Psychophysiology, University of Florida.

Talairach J, Tournoux P (1988) Co-planar stereotaxic atlas of the human brain. Stuttgart, Germany, Thieme.

Ward BD (2000) Simultaneous inference for fMRI data. Biophysics Research Institute, Medical College of Wisconsin. http://afni.nimh.nih.gov/pub/dist/doc/manual/AlphaSim.pdf

Matt

Contents

Background

Methods

Overview of dataset

Subjects

Task

MR acquisition

MR analysis

Pre-processing and fMRI analysis

Relating depressive symptomatology to BOLD response

Statistical methods

Results

Discussion

References - Resources and related work

Navigation menu

Matt

Background

Methods

Overview of dataset

Subjects

Task

MR acquisition

MR analysis

Pre-processing and fMRI analysis

Relating depressive symptomatology to BOLD response

Statistical methods

Results

Discussion

References - Resources and related work

Navigation menu

Search