Deviations

This vignette covers the main function of chromVAR, computeDeviations.

Inputs

The computeDeviations has two required inputs, object and annotations. The first argument should be a RangedSummarizedExperiment with a counts assay storing the fragment counts per peak (rows) per cell or sample (columns). The second argument (annnotations) argument should be a RangedSummarizedExperiment with a motif_matches or annotation_matches assay storing whether each peak (rows) contains each annotation (columns). For more information on these two primary inputs, see the “Counts” and “Annotations” vignettes.

# Getting inputs ready -- See Inputs vignette for more info
library(chromVAR)
library(motifmatchr)
library(SummarizedExperiment)
library(Matrix)
library(BSgenome.Hsapiens.UCSC.hg19)
library(BiocParallel)
set.seed(2017)
register(SerialParam())
data(example_counts, package = "chromVAR")
example_counts <- addGCBias(example_counts, genome = BSgenome.Hsapiens.UCSC.hg19)
counts_filtered <- filterSamples(example_counts, min_depth = 1500, min_in_peaks = 0.15, shiny = FALSE)
counts_filtered <- filterPeaks(counts_filtered)
motifs <- getJasparMotifs()
motif_ix <- matchMotifs(motifs, counts_filtered, genome = BSgenome.Hsapiens.UCSC.hg19)

# computing deviations
dev <- computeDeviations(object = counts_filtered, 
                                 annotations = motif_ix)

Optionally, the annotations can be omitted and then deviations will be computed per peak. That approach should only be use with bulk, deeply sequenced data.

Outputs

The output from the computeDeviations function is a chromVARDeviations object. This object inherits from the RangedSummarizedExperiment object, so anything you can do with that object can be done with this one.

print(dev)
## class: chromVARDeviations 
## dim: 386 36 
## metadata(0):
## assays(2): deviations z
## rownames(386): MA0025.1_NFIL3 MA0030.1_FOXF2 ... MA0909.1_HOXD13
##   MA0914.1_ISL2
## rowData names(3): name fractionMatches fractionBackgroundOverlap
## colnames(36): singles-GM12878-140905-1 singles-GM12878-140905-2
##   ... singles-H1ESC-140820-24 singles-H1ESC-140820-25
## colData names(2): Cell_Type depth

The object stores two main ‘assays’, which can be accessed using the deviations and deviationScores functions. The deviations are the bias corrected deviations in accessibility. For each motif or annotation (rows), there is a value for each cell or sample (columns) representing how different the accessibility for peaks with that motif or annotation is from the expectation based on all cells being equal, corrected for biases.

deviations(dev)[1:3,1:2]
##                singles-GM12878-140905-1 singles-GM12878-140905-2
## MA0025.1_NFIL3              -0.07486682               0.09869513
## MA0030.1_FOXF2              -0.08766814              -0.09944391
## MA0031.1_FOXD1              -0.08325694              -0.19791937

The deviationScores are the Z-scores for each bias corrected deviations.

deviationScores(dev)[1:3,1:2]
##                singles-GM12878-140905-1 singles-GM12878-140905-2
## MA0025.1_NFIL3               -0.5767749                 0.820191
## MA0030.1_FOXF2               -0.7841626                -1.099900
## MA0031.1_FOXD1               -0.6978056                -2.146026

The absolute value of the deviation scores will be correlated with the read depth as with more reads, there can be more confidence that the difference in accessibility from the expecation is greater than would occur by chance.

Metadata

Any column metadata stored as “colData” in a SummarizedExperiment object used as the counts input for computeDeviations will be propagated as “colData” in the output. Any column metadata stored as “colData” in a SummarizedExperiment object used as the annotations input will be propagated as the “rowData” in the output. Additionally, two columns in the rowData will be created: “fractionMatches”, which gives the fraction of peaks that had a match for that motif, and “fractionBackgroundOverlap”, which gives the average fraction of background peaks for the motif that contain a match for the motif.

Options

The computeDeviations function also has two optional arguments, background_peaks and expectation.

Background peaks

The function computeDeviations will use a set of background peaks for bias correcting the deviations. This computation is done internally by default and not returned – to have greater control over this step, a user can run the getBackgroundPeaks function themselves and pass the result to computeDeviations under the background_peaks parameter. If the fragment counts objects is a Matrix/matrix or a SummarizedExperiment without a “bias” column in the rowData, then it is mandatory to supply a background_peaks argument.

Background peaks are peaks that are similar to a peak in GC content and average accessibility.

bg <- getBackgroundPeaks(object = counts_filtered)

The result from getBackgroundPeaks is a matrix of indices, where each column represents the index of the peak that is a background peak. The number of columns represents the number of background iterations that will be used for computing the bias corrected deviations.

To use the background peaks computed, simply add those to the call to computeDeviations:

dev <- computeDeviations(object = counts_filtered, 
                                 annotations = motif_ix, 
                                 background_peaks = bg)

If you are using a counts matrix or a SummarizedExperiment without a “bias” column in the rowRanges or rowData, then a bias argument should be given as well.

Expectation

By default, chromVAR measures the difference in chromatin accessibility for an annotation relative to the expectation if all cells or samples have the same chromatin accessibility profile but differ simply in read depth. Optionally, this expectation can be adjusted. The function computeExpecations can calculate the expectation in alternate ways. The function can also be applied to only a subset of the cells or samples.

expected <- computeExpectations(counts_filtered) 

By default, this function will compute the expected fraction of reads per peak as the the total fragments per peak across all samples divided by total reads in peaks in all samples. Optionally, norm can be set to TRUE and then the expectation will be the average fraction of reads in a peak across the cells. This is not recommended for single cell applications as cells with very few reads will have a large impact.

# Not recommended for single cell data or other very sparse data!
expected <- computeExpectations(counts_filtered, norm = TRUE) 

Another option is to give a vector of groups, in which case the expectation will be the average fraction of reads per peak within each group. If a group vector is provided and norm is set to TRUE then within each group the fraction of reads per peak is the average fraction of reads per peak in each sample. Otherwise, the within group fraction of reads per peak is based on the reads per peak within the sample divided by the total reads within each sample.

expected <- computeExpectations(counts_filtered, 
                                group = colData(counts_filtered)$Cell_Type) 

To use your computed expectation vector, simply pass it to the expectation argument.

dev <- computeDeviations(object = counts_filtered, 
                          annotations = motif_ix, 
                          expectation = expected)

Session Info

Sys.Date()
## [1] "2017-11-10"
sessionInfo()
## R Under development (unstable) (2017-11-10 r73707)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## Matrix products: default
## BLAS: /home/travis/R-bin/lib/R/lib/libRblas.so
## LAPACK: /home/travis/R-bin/lib/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    methods   stats     graphics  grDevices utils    
## [8] datasets  base     
## 
## other attached packages:
##  [1] BiocParallel_1.12.0               BSgenome.Hsapiens.UCSC.hg19_1.4.0
##  [3] BSgenome_1.46.0                   rtracklayer_1.38.0               
##  [5] Biostrings_2.46.0                 XVector_0.18.0                   
##  [7] Matrix_1.2-11                     SummarizedExperiment_1.8.0       
##  [9] DelayedArray_0.4.1                matrixStats_0.52.2               
## [11] Biobase_2.38.0                    GenomicRanges_1.30.0             
## [13] GenomeInfoDb_1.14.0               IRanges_2.12.0                   
## [15] S4Vectors_0.16.0                  BiocGenerics_0.24.0              
## [17] motifmatchr_1.1.1                 chromVAR_1.1.1                   
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6                DirichletMultinomial_1.20.0
##  [3] TFBSTools_1.16.0            bit64_0.9-7                
##  [5] httr_1.3.1                  rprojroot_1.2              
##  [7] tools_3.5.0                 backports_1.1.1            
##  [9] R6_2.2.2                    DT_0.2                     
## [11] seqLogo_1.44.0              DBI_0.7                    
## [13] lazyeval_0.2.0              colorspace_1.3-2           
## [15] bit_1.1-12                  compiler_3.5.0             
## [17] plotly_4.7.1                caTools_1.17.1             
## [19] scales_0.5.0                readr_1.1.1                
## [21] nabor_0.4.7                 stringr_1.2.0              
## [23] digest_0.6.12               Rsamtools_1.30.0           
## [25] rmarkdown_1.7               R.utils_2.6.0              
## [27] pkgconfig_2.0.1             htmltools_0.3.6            
## [29] htmlwidgets_0.9             rlang_0.1.2.9000           
## [31] RSQLite_2.0                 VGAM_1.0-4                 
## [33] shiny_1.0.5                 bindr_0.1                  
## [35] jsonlite_1.5                gtools_3.5.0               
## [37] dplyr_0.7.3                 R.oo_1.21.0                
## [39] RCurl_1.95-4.8              magrittr_1.5               
## [41] GO.db_3.5.0                 GenomeInfoDbData_0.99.1    
## [43] Rcpp_0.12.13                munsell_0.4.3              
## [45] R.methodsS3_1.7.1           stringi_1.1.5              
## [47] yaml_2.1.14                 zlibbioc_1.24.0            
## [49] plyr_1.8.4                  grid_3.5.0                 
## [51] blob_1.1.0                  miniUI_0.1.1               
## [53] CNEr_1.14.0                 lattice_0.20-35            
## [55] splines_3.5.0               annotate_1.56.0            
## [57] hms_0.3                     KEGGREST_1.18.0            
## [59] knitr_1.17                  JASPAR2016_1.6.0           
## [61] codetools_0.2-15            reshape2_1.4.2             
## [63] TFMPvalue_0.0.6             XML_3.98-1.9               
## [65] glue_1.1.1                  evaluate_0.10.1            
## [67] data.table_1.10.4           png_0.1-7                  
## [69] httpuv_1.3.5                tidyr_0.7.1                
## [71] purrr_0.2.3                 gtable_0.2.0               
## [73] poweRlaw_0.70.1             assertthat_0.2.0           
## [75] ggplot2_2.2.1               mime_0.5                   
## [77] xtable_1.8-2                viridisLite_0.2.0          
## [79] tibble_1.3.4                GenomicAlignments_1.14.0   
## [81] AnnotationDbi_1.40.0        memoise_1.1.0              
## [83] bindrcpp_0.2