Applications

This vignettes covers some of the various applications of chromVAR’s bias corrected deviations and Z-scores for motifs or other annotations. First, we will compute those deviations:

library(chromVAR)
library(motifmatchr)
library(SummarizedExperiment)
library(Matrix)
library(ggplot2)
library(BiocParallel)
library(BSgenome.Hsapiens.UCSC.hg19)
register(SerialParam())
data(example_counts, package = "chromVAR")
set.seed(2017)
example_counts <- addGCBias(example_counts, genome = BSgenome.Hsapiens.UCSC.hg19)
counts_filtered <- filterSamples(example_counts, min_depth = 1500, 
                                 min_in_peaks = 0.15, shiny = FALSE)
counts_filtered <- filterPeaks(counts_filtered)
motifs <- getJasparMotifs()
motif_ix <- matchMotifs(motifs, counts_filtered, genome = BSgenome.Hsapiens.UCSC.hg19)
dev <- computeDeviations(object = counts_filtered, 
                                 annotations = motif_ix)

Variability

The first application is simply to compute the variability of each motif or annotation across the cells or samples of interest. The function plotVariability

variability <- computeVariability(dev)
plotVariability(variability, use_plotly = FALSE)

Clustering

We can also use the bias corrected deviations to cluster the samples. The function getSampleCorrelation first removes highly correlated annotations and low variability annotations and then computes the correlation between the cells for the remaining annotations.

sample_cor <- getSampleCorrelation(dev)

library(pheatmap)
pheatmap(as.dist(sample_cor), 
         annotation_row = colData(dev), 
         clustering_distance_rows = as.dist(1-sample_cor), 
         clustering_distance_cols = as.dist(1-sample_cor))

Cell / sample similarity

We can also use tSNE for looking at cell similarity. The function deviationsTsne performs tsne and results a data.frame with the results. If running in an interactive session, shiny can be set to TRUE to load up a shiny gadget for exploring parameters. In general, with 100’s of cells a perplexity of around 30-50 might make sense. In this example, we have few cells so we are bounded in how high the perplexity can be set– we will use 10 here.

tsne_results <- deviationsTsne(dev, threshold = 1.5, perplexity = 10, 
                               shiny = FALSE)

To plot the results, plotDeviationsTsne can be used. If running in an interactive session or an interactive Rmarkdown document, shiny can be set to TRUE to generate a shiny widget. Here we will show static results.

tsne_plots <- plotDeviationsTsne(dev, tsne_results, annotation = "TEAD3", 
                                   sample_column = "Cell_Type", shiny = FALSE)
tsne_plots[[1]]

tsne_plots[[2]]

Differential accessibility and variability

The differentialDeviations function determines whether there is a significant difference between the bias corrected deviations for a given annotation between different groups. The groups can be specified by giving the column name of a column in the colData of the dev object or a vector of group assignments.

diff_acc <- differentialDeviations(dev, "Cell_Type")

head(diff_acc)
##                           p_value p_value_adjusted
## MA0025.1_NFIL3       6.667785e-02     1.175006e-01
## MA0030.1_FOXF2       1.802723e-02     4.166773e-02
## MA0031.1_FOXD1       3.825026e-03     1.077708e-02
## MA0051.1_IRF2        1.562799e-12     6.032403e-11
## MA0056.1_MZF1        5.858065e-01     6.709831e-01
## MA0057.1_MZF1(var.2) 1.686300e-04     7.396724e-04

The differentialVariability function determines whether there is a significant difference between the variability of any of the annotations between different groups.

diff_var <- differentialVariability(dev, "Cell_Type")

head(diff_var)
##                        p_value p_value_adjusted
## MA0025.1_NFIL3       0.4330854        0.9135025
## MA0030.1_FOXF2       0.7011659        0.9652404
## MA0031.1_FOXD1       0.1308742        0.6512322
## MA0051.1_IRF2        0.4142727        0.9007595
## MA0056.1_MZF1        0.4509686        0.9141348
## MA0057.1_MZF1(var.2) 0.9073926        0.9652404

Motif / kmer similarity

We can also perform tsne for motif similarity rather than cell similarity, by specifying what = "annotations" to the deviationsTsne function.

inv_tsne_results <- deviationsTsne(dev, threshold = 1.5, perplexity = 8, 
                                    what = "annotations", shiny = FALSE)

ggplot(inv_tsne_results, aes(x = Dim1, y = Dim2)) + geom_point() + 
  chromVAR_theme()

Kmers and sequence specificity of variation

By using kmers as the annnotation, we can use the kmers to identify the precise nucleotides necessary for chromatin accessibility variability. The deviationsCovariability function returns a normalized covariance between the bias corrected deviations of any pair of annotations. We will use 6mers here in the interest of computational time, but in general 7mers yield higher variability and are better starting points for assembling de novo motifs (see next section).

kmer_ix <- matchKmers(6, counts_filtered, genome = BSgenome.Hsapiens.UCSC.hg19)
kmer_dev <- computeDeviations(counts_filtered, kmer_ix)

kmer_cov <- deviationsCovariability(kmer_dev)

plotKmerMismatch("CATTCC",kmer_cov)

De novo kmer assembly

We can use the assembleKmers function to build de novo motifs using the kmer deviation results. The function goes iteratively through the most variable kmers and uses the normalized covariance of the bias corrected deviations for closely related kmers to weight each nucleotide in the motif.

de_novos <- assembleKmers(kmer_dev, progress = FALSE) #no progress bar
de_novos
## PWMatrixList of length 23
## names(23): denovo_1 denovo_2 denovo_3 ... denovo_21 denovo_22 denovo_23

We can use the pwmDistance function to see how close our de novo motifs match known motifs. pwmDistance returns a list with three matrices- ‘dist’ has the distance between each pair of motifs, ‘strand’ the strand of the motif for the match, and ‘offset’ the offset between the motifs.

dist_to_known <- pwmDistance(de_novos, motifs)

closest_match1 <- which.min(dist_to_known$dist[1,])
dist_to_known$strand[1,closest_match1]
## [1] "-"
library(ggmotif) # Package on github at AliciaSchep/ggmotif. Can use seqLogo alternatively
library(TFBSTools)
## 
## Attaching package: 'TFBSTools'
## The following object is masked from 'package:Matrix':
## 
##     Matrix
## The following object is masked from 'package:DelayedArray':
## 
##     matrixClass
# De novo motif
ggmotif_plot(de_novos[[1]])

# Closest matching known
ggmotif_plot(toPWM(reverseComplement(motifs[[closest_match1]]),type = "prob"))

In this case, it looks like the last five base pairs of the de novo motif match well with the first five of the known motif. To get a better sense of whether these might represent the same motif, one can compute the deviations for the de novo motif and see if they are correlated with the known.

Synergy between motifs

chromVAR includes a function for computing the “synergy” between pairs of annotations/motifs, where synergy is defined as the excess variability of chromatin accessibility for peaks sharing both motifs compared to a random sub-sample of the same size of peaks with one motif (the one with greater variability).

Function for computing synergy

getAnnotationSynergy(counts_filtered, motif_ix[,c(83,24,20)])
##                      MA0139.1_CTCF MA0107.1_RELA MA0140.2_GATA1::TAL1
## MA0139.1_CTCF            0.0000000     -2.001218            0.3338846
## MA0107.1_RELA           -2.0012182      0.000000            0.7990940
## MA0140.2_GATA1::TAL1     0.3338846      0.799094            0.0000000

The result is a matrix with Z-scores for the variability synergy of each possible pairing. Note that this function is pretty slow and should only be performed with a limited selection of motifs and not the full collection!

Correlation

An extreme synergy score suggests the possibility of cooperative or competitive binding between the factors that bind the motif. However, if the two factors simply tend to be co-expressed or oppositely expressed, then the synergy score could also be extreme. It can thus be helpful to look at the correlation between the deviations for the two motifs, in particular the correlation between the deviations for the peak sets that only have one or only the other motif. There is a function to compute this correlation:

getAnnotationCorrelation(counts_filtered, motif_ix[,c(83,24,20)])
##                      MA0139.1_CTCF MA0107.1_RELA MA0140.2_GATA1::TAL1
## MA0139.1_CTCF            1.0000000   -0.41950962          -0.27827092
## MA0107.1_RELA           -0.4195096    1.00000000          -0.08943439
## MA0140.2_GATA1::TAL1    -0.2782709   -0.08943439           1.00000000

Session Info

Sys.Date()
## [1] "2017-11-10"
sessionInfo()
## R Under development (unstable) (2017-11-10 r73707)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## Matrix products: default
## BLAS: /home/travis/R-bin/lib/R/lib/libRblas.so
## LAPACK: /home/travis/R-bin/lib/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    methods   stats     graphics  grDevices utils    
## [8] datasets  base     
## 
## other attached packages:
##  [1] TFBSTools_1.16.0                  ggmotif_0.0.0                    
##  [3] pheatmap_1.0.8                    BSgenome.Hsapiens.UCSC.hg19_1.4.0
##  [5] BSgenome_1.46.0                   rtracklayer_1.38.0               
##  [7] Biostrings_2.46.0                 XVector_0.18.0                   
##  [9] BiocParallel_1.12.0               ggplot2_2.2.1                    
## [11] Matrix_1.2-11                     SummarizedExperiment_1.8.0       
## [13] DelayedArray_0.4.1                matrixStats_0.52.2               
## [15] Biobase_2.38.0                    GenomicRanges_1.30.0             
## [17] GenomeInfoDb_1.14.0               IRanges_2.12.0                   
## [19] S4Vectors_0.16.0                  BiocGenerics_0.24.0              
## [21] motifmatchr_1.1.1                 chromVAR_1.1.1                   
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6                DirichletMultinomial_1.20.0
##  [3] bit64_0.9-7                 RColorBrewer_1.1-2         
##  [5] httr_1.3.1                  rprojroot_1.2              
##  [7] tools_3.5.0                 backports_1.1.1            
##  [9] R6_2.2.2                    DT_0.2                     
## [11] seqLogo_1.44.0              DBI_0.7                    
## [13] lazyeval_0.2.0              colorspace_1.3-2           
## [15] bit_1.1-12                  compiler_3.5.0             
## [17] plotly_4.7.1                labeling_0.3               
## [19] caTools_1.17.1              scales_0.5.0               
## [21] readr_1.1.1                 nabor_0.4.7                
## [23] stringr_1.2.0               digest_0.6.12              
## [25] Rsamtools_1.30.0            rmarkdown_1.7              
## [27] R.utils_2.6.0               pkgconfig_2.0.1            
## [29] htmltools_0.3.6             htmlwidgets_0.9            
## [31] rlang_0.1.2.9000            RSQLite_2.0                
## [33] VGAM_1.0-4                  shiny_1.0.5                
## [35] bindr_0.1                   jsonlite_1.5               
## [37] gtools_3.5.0                dplyr_0.7.3                
## [39] R.oo_1.21.0                 RCurl_1.95-4.8             
## [41] magrittr_1.5                GO.db_3.5.0                
## [43] GenomeInfoDbData_0.99.1     Rcpp_0.12.13               
## [45] munsell_0.4.3               R.methodsS3_1.7.1          
## [47] stringi_1.1.5               yaml_2.1.14                
## [49] zlibbioc_1.24.0             Rtsne_0.13                 
## [51] plyr_1.8.4                  grid_3.5.0                 
## [53] blob_1.1.0                  miniUI_0.1.1               
## [55] CNEr_1.14.0                 lattice_0.20-35            
## [57] splines_3.5.0               annotate_1.56.0            
## [59] hms_0.3                     KEGGREST_1.18.0            
## [61] knitr_1.17                  JASPAR2016_1.6.0           
## [63] codetools_0.2-15            reshape2_1.4.2             
## [65] TFMPvalue_0.0.6             XML_3.98-1.9               
## [67] glue_1.1.1                  evaluate_0.10.1            
## [69] data.table_1.10.4           png_0.1-7                  
## [71] httpuv_1.3.5                tidyr_0.7.1                
## [73] purrr_0.2.3                 gtable_0.2.0               
## [75] poweRlaw_0.70.1             assertthat_0.2.0           
## [77] mime_0.5                    xtable_1.8-2               
## [79] viridisLite_0.2.0           tibble_1.3.4               
## [81] GenomicAlignments_1.14.0    AnnotationDbi_1.40.0       
## [83] memoise_1.1.0               bindrcpp_0.2