Using ggseg to plot multivariate brain data

Cortical Thickness

Code

Analysis

Multivariate

PLS (partial-least-squares)

Published

October 17, 2022

Using ggseg, developed by Mowinckel and Vidal-Piñeiro (2019) is currently one of my favorite things about doing neuroimaging analysis in R. Most R users are familiar with using ggplot to iteratively build up their plots, and ggseg uses this grammar of graphics to produce stunning visualizations.

This approach is freeing since it encourages you to think about models and parameters - you aren’t constrained by a package’s defaults, if you want to plot median gray matter thickness or the variability of the BOLD signal, it becomes trivial to build a model that gives you those parameters which you can then visualize with ggseg.

Setting up our Example: Sleep and Cortical Thickness

Consider the following example:

Let’s consider a hierarchical approach to examining the relationship between cortical thickness and sleep variables. We’ll use partial least squares (PLS) to examine the covariance between (self-reported) sleep and thickness variables. Note that there’s one person with missing data 179548 - we’ll drop this participant before doing any further analyses…

From the Pittsburgh Sleep Inventory Questionnaire (PSIQ)

Component 1: Subjective sleep quality—question 9 Component 2: Sleep latency—questions 2 and 5a Component 3: Sleep duration—question 4 Component 4: Sleep efficiency—questions 1, 3, and 4Component 5: Sleep disturbance—questions 5b-5j Component 6: Use of sleep medication—question 6 Component 7: Daytime dysfunction—questions 7 and 8

library(ggseg)
library(brms)
library(readr)
library(tidyverse)
library(here)
library(ggplot2)

#first read in the behavioural data (sleep data)


behav  <- read_csv("./Brain_data/unrestricted_johnaeanderson_10_18_2022_9_53_32.csv")%>%
  dplyr::select(Subject, contains("PSQI_COMP")) %>%
  dplyr::filter(Subject !=179548)
#now read in the cortical thickness data
thickness <- read_csv("./Brain_data/unrestricted_johnaeanderson_10_18_2022_9_54_21.csv")%>%
  dplyr::select(Subject, contains("Thck"))%>%
  dplyr::filter(Subject !=179548)

head(thickness)

# A tibble: 6 × 69
  Subject FS_L_Banksst…¹ FS_L_…² FS_L_…³ FS_L_…⁴ FS_L_…⁵ FS_L_…⁶ FS_L_…⁷ FS_L_…⁸
    <dbl>          <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1  103818           2.78    3.04    3.04    2.15    3.86    3.07    2.78    3.12
2  105923           2.62    2.67    2.65    1.91    3.26    2.82    2.54    2.89
3  111312           2.37    2.47    2.57    1.92    3.01    2.72    2.34    2.79
4  114823           2.49    2.82    2.66    2.00    3.12    2.75    2.60    2.87
5  115320           2.53    2.49    2.74    1.94    3.39    2.90    2.61    2.96
6  122317           2.67    2.48    2.79    2.32    3.56    2.88    2.66    3.04
# … with 60 more variables: FS_L_Isthmuscingulate_Thck <dbl>,
#   FS_L_Lateraloccipital_Thck <dbl>, FS_L_Lateralorbitofrontal_Thck <dbl>,
#   FS_L_Lingual_Thck <dbl>, FS_L_Medialorbitofrontal_Thck <dbl>,
#   FS_L_Middletemporal_Thck <dbl>, FS_L_Parahippocampal_Thck <dbl>,
#   FS_L_Paracentral_Thck <dbl>, FS_L_Parsopercularis_Thck <dbl>,
#   FS_L_Parsorbitalis_Thck <dbl>, FS_L_Parstriangularis_Thck <dbl>,
#   FS_L_Pericalcarine_Thck <dbl>, FS_L_Postcentral_Thck <dbl>, …

Tidying up the variable names

Note that the names from the dk atlas in our dataset are not the same as are used in the ggseg package - we’ll need to do some basic tidying up

The steps we’ll need to follow are:

Remove the prefix FS_
Remove the suffix _Thck
Make everything lowercase
Replace r_ with rh_
And replace l_ with lh_

thickness <-  thickness %>% 
    rename_at(vars(starts_with("FS_L_")), funs(str_replace(., "FS_L_", "lh_")))%>% 
    rename_at(vars(starts_with("FS_R_")), funs(str_replace(., "FS_R_", "rh_")))%>%
    rename_at(vars(ends_with("Thck")), funs(str_replace(., "_Thck", "")))

Warning: `funs()` was deprecated in dplyr 0.8.0.
Please use a list of either functions or lambdas: 

  # Simple named list: 
  list(mean = mean, median = median)

  # Auto named with `tibble::lst()`: 
  tibble::lst(mean, median)

  # Using lambdas
  list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))

names(thickness) <- tolower(names(thickness))

Plot the distributions of the Sleep Variables

behav  %>% pivot_longer(!c(Subject, PSQI_Compl), names_to = "Component", values_to = "score") %>%
ggplot(., aes(x=score, color=Component)) +
  geom_density()+
             scale_color_grey() + 
             theme_classic()+
             facet_wrap(~Component)

Figure 1: So, maybe if we were doing this for real, we’d be concerned about the non-normality of some of these factors (components 3,4, 6, and 7, I’m looking at you)

Ye bog-standard correlation analyses…

Ok - now to run the PLS analysis

library(corrplot)

# Compute the covariance matrix
heat <- cor(dplyr::select(behav, -c("Subject", "PSQI_Compl")),dplyr::select(thickness, -"subject"))

# Plot it with corrplot
corrplot(heat, method = "color")

The heatmap above is pretty useless… Can we do any better? Well, we could look at just one of the relationships and plot that using ggseg. Here, for example, is the relationship between thickness and component 1 (subjective sleep quality)

toplot1 <- tibble::rownames_to_column(as.data.frame(heat), "components")
average_thickness <- tibble::rownames_to_column(as.data.frame(colMeans(thickness[-1])), "label")
colnames(average_thickness) <- c("label","thickness")

toplot1 <- toplot1 %>%
  pivot_longer(!components, names_to = "label", values_to = "correlation")

(plot1 <- toplot1 %>%
  group_by(components) %>%
  ggplot() +
  geom_brain(atlas = dk, 
             position = position_brain(hemi ~ side),
             aes(fill = correlation), color="black")+
             facet_wrap(~components))

(plot2 <- average_thickness %>%
  ggplot() +
  geom_brain(atlas = dk, 
             position = position_brain(hemi ~ side),
             aes(fill = thickness), color="black")
)

Nice!

Of course, takes each variable separately. This is a lot of separate analyses, and doesn’t speak to how all these variables fit together. Also, note that we have plotted the raw effect sizes & aren’t correcting for multiple comparisons (we’ve just run a whole heck of a lot of correlations).

Nevertheless, if pressed, I would point out that there appears to be a preliminary relationship between sleep quality and cortical thickness in the anterior cingulate cortex.

Note - the above ggplot can be customized in the normal way with ggplot syntax

library(easystats)

Warning: package 'easystats' was built under R version 4.1.3

# Attaching packages: easystats 0.4.3 (red = needs update)
✖ insight     0.18.0    ✖ datawizard  0.4.1  
✖ bayestestR  0.12.1    ✖ performance 0.9.1  
✖ parameters  0.18.1    ✖ effectsize  0.7.0  
✖ modelbased  0.8.1     ✖ correlation 0.8.1  
✖ see         0.7.1     ✖ report      0.5.1.1

Restart the R-Session and update packages in red with 'easystats::easystats_update()'.

plot1+ theme_void()

merging atlas and data by 'label'

Figure 2: Univariate correlations with the theme_void applied

Onward, to the great multivariate beyond!

Great, now let’s consider a multivariate perspective. We’ll use the TExPosition package and peripherals (get them from Hervé Abdi’s Github page). Note that since we’re using resampling to determine significance and reliability (permutations and boostrapping), we should set a seed (starting with the same random value) so that we can replicate our results.

library(TExPosition) #the pls algorithms
library(data4PCCAR) #permutations
library(PTCA4CATA) #scree plot
set.seed(1234)
pls <- tepPLS(dplyr::select(behav, -c("Subject", "PSQI_Compl")),dplyr::select(thickness, -"subject"), graphs = FALSE)

# permuation tests byColumns
nIter <- 10000 #set the number of iterations to run the permutation test 

perm.bycol <- perm4PLSC(dplyr::select(behav, -c("Subject", "PSQI_Compl")),
                        dplyr::select(thickness, -"subject"), 
                        permType = 'byColumns', # the default type is byMat which permute by labels of observations
                                                # byColumns option permutes all cols of each data matrix independently
                        nIter = nIter)

scree      <- PlotScree(ev = pls$TExPosition.Data$eigs,
                        p.ev = perm.bycol$pEigenvalues,
                        title = "Explained Variance per Dimension + Permutation Tests",
                        plotKaiser = TRUE)

From this, we could interpret that there are two components we might want to investigate. We could check out the p values for each eigenvalue as well using the perm.bycol$pEigenvalues[1] which gives a marginally significant (I know) value of 0.063 for the first LV. Nothing else is really worth investigating.

We’ll also want to interpret the salience scores (contributions) using bootstrap ratios so that we examine only the reliable entries.

resBoot4PLSC <- Boot4PLSC(dplyr::select(behav, -c("Subject", "PSQI_Compl")),
                        dplyr::select(thickness, -"subject"), # Second Data matrix
                          nIter = 10000, # How many iterations
                      Fi = pls$TExPosition.Data$fi,
                      Fj = pls$TExPosition.Data$fj,
                      nf2keep = 3,
                      critical.value = 2,
                      # To be implemented later
                      # has no effect currently
                      alphaLevel = .05)

BR.I <- resBoot4PLSC$bootRatios.i
BR.J <- resBoot4PLSC$bootRatios.j

#calculate the quantile scores for the confidence intervals
#a <- as.data.frame(resBoot4PLSC$bootstrapBrick.i) #flattens bootstrap brick into a dataframe #
#b <- t(a) #transposes it 
#b_1 <-  tibble::rownames_to_column(as.data.frame(b), "Dimensions")#converts row labels to separate column
#b_1 <- b_1 %>%
#    separate(Dimensions, c("Dimension", "Iteration"), ".Iteration ")
#apply the function
#alist <- split(b_1, b_1$Dimension)
#str(alist)
#
#lapply(alist$`Dimension 1`, function(x) quantile(x,  c(0.0275,.975)))

Plotting the bootstrap ratios…

library(viridis)
cols=c("gray","black")

Bootstrap_ratios <- tibble::rownames_to_column(as.data.frame(BR.J), "label")%>%
  pivot_longer(!label, names_to = "LVs", values_to = "BSR")

Bootstrap_ratios$reliable <- abs(Bootstrap_ratios$BSR)>2


plot2 <- Bootstrap_ratios %>%
  group_by(LVs) %>%
  ggplot() +
  geom_brain(atlas = dk, 
             size = 0.5,
             position = position_brain(hemi ~ side),
             aes(fill = BSR, alpha=reliable==TRUE, colour=factor(reliable)))+
             facet_wrap(~LVs,ncol = 3)+
             scale_fill_gradient2(low = "darkblue", high= "red", midpoint = 0,mid = "white",na.value="#f2f2f2")+
             scale_colour_manual(values=cols)+
             theme_void()+
             guides(color = FALSE)+
             guides(shape = FALSE)+ 
             guides(alpha = FALSE)+ 
             theme(legend.position="bottom")


#plotting the brain-scores
Brain_Scores <- tibble::rownames_to_column(as.data.frame(BR.I), "label")%>%
  pivot_longer(!label, names_to = "LVs", values_to = "BS")

plot3 <- ggplot(Brain_Scores, 
                aes(x = label, y = BS, fill=LVs))+
                geom_col(stat="identity")+
                facet_wrap(~LVs)+
                geom_hline(yintercept = 2, linetype=2)+
                geom_hline(yintercept = -2, linetype=2)+
                theme_modern()+
                theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
                guides(fill = FALSE)

ggpubr::ggarrange(plot2, plot3, nrow = 2, heights = c(.75, 1))

Figure 3: Partial Least Squares Analysis showing the relationship between sleep variables from the PSIQ and cortical thickness

Sample write up:

We ran a multivariate partial least squares correlation (PLS-C) analysis McIntosh and Lobaugh (2004) relating seven self-reported sleep variables from the PSIQ to cortical thickness. The first LV explained 35.17% of the cross-block covariance, and was marginally significant p = 0.063 (see Figure 3). For this LV, higher values of the first four sleep variables (Subjective sleep quality, Sleep latency, Sleep duration, and Sleep efficiency) were associated with higher cortical thickness in bilateral occipital lobe, anterior cingulate cortex, left motor cortex, and bilateral parahippocampal cortex. There were no regions where higher sleep quality was associated with lower cortical thickness values. This pattern appears to be most strongly associated with sleep latency and sleep efficiency, and notably, the use of sleep medication and daytime dysfunction load in the opposite direction - though are not as strongly predictive.

References

“Ggseg: Plotting Tool for Brain Atlases, in Ggplot.” n.d. Github. https://github.com/ggseg/ggseg.

Krishnan, Anjali, Lynne J Williams, Anthony Randal McIntosh, and Hervé Abdi. 2011. “Partial Least Squares (PLS) Methods for Neuroimaging: A Tutorial and Review.” Neuroimage 56 (2): 455–75.

McIntosh, Anthony Randal, and Nancy J Lobaugh. 2004. “Partial Least Squares Analysis of Neuroimaging Data: Applications and Advances.” Neuroimage 23: S250–63.

Mowinckel, Athanasia M, and Didac Vidal-Piñeiro. 2019. “Visualisation of Brain Statistics with R-Packages Ggseg and Ggseg3d.” arXiv [Stat. OT]. http://arxiv.org/abs/1912.08200.

CANAL Lab Blog

Search This Blog