Annotations

The main computeDeviations function from chromVAR requires an object storing what peaks overlap what motifs or other annotations. The package includes functions for creating such an object from a set of motifs or kmers, or for converting an existing matrix, data.frame, list, GenomicRangesList, or list of bed files of annotations into the appropriate object.

library(chromVAR)
library(motifmatchr)
library(SummarizedExperiment)
library(Matrix)
library(BiocParallel)
register(SerialParam())

Motifs or kmers

The most common type of annotations to use for chromVAR are motifs. chromVAR has a function to make it easy to read in motifs from the JASPAR database:

jaspar_motifs <- getJasparMotifs() # default species is human

Package with motifs

A companion package, chromVARmotifs, includes pwms from a couple different sources that can also be used with motifmatchr and chromVAR.

devtools::install_github("GreenleafLab/chromVARmotifs")
## Using GitHub PAT from envvar GITHUB_PAT
## Downloading GitHub repo GreenleafLab/chromVARmotifs@master
## from URL https://api.github.com/repos/GreenleafLab/chromVARmotifs/zipball/master
## Installing chromVARmotifs
## '/home/travis/R-bin/lib/R/bin/R' --no-site-file --no-environ --no-save  \
##   --no-restore --quiet CMD INSTALL  \
##   '/tmp/RtmpMNe5Ax/devtools411e75a49d3d/GreenleafLab-chromVARmotifs-75af948'  \
##   --library='/home/travis/R/Library' --install-tests
## 

We can load these collections using data.

library(chromVARmotifs)

data("human_pwms_v1") # human collection

data("mouse_pwms_v1") # mouse collection

data("homer_pwms")

data("encode_pwms")

If using your own motifs, they need to be formatted as either a PWMatrixList or a PFMatrixList from the TFBSTools package. See documentation from TFBStools for those two objects.

Finding motif matches

The package motifmatchr can be used to find motifs within peaks. The matchMotifs method takes as inputs the motif lists described above, or your own list of motifs, and returns a (Ranged)SummarizedExperiment with a matrix indicating what peaks (rows) contain what motifs (columns).

library(BSgenome.Hsapiens.UCSC.hg19)
## Loading required package: BSgenome
## Loading required package: Biostrings
## Loading required package: XVector
## 
## Attaching package: 'Biostrings'
## The following object is masked from 'package:DelayedArray':
## 
##     type
## The following object is masked from 'package:base':
## 
##     strsplit
## Loading required package: rtracklayer
# First get filtered counts
data(example_counts, package = "chromVAR")
example_counts <- addGCBias(example_counts, genome = BSgenome.Hsapiens.UCSC.hg19)
counts_filtered <- filterSamples(example_counts, min_depth = 1500, min_in_peaks = 0.15, shiny = FALSE)
counts_filtered <- filterPeaks(counts_filtered)

# get motif matches
motif_ix <- matchMotifs(jaspar_motifs, counts_filtered, genome = BSgenome.Hsapiens.UCSC.hg19)

matchMotifs can accept as the second argument either a RangedSummarizedExperiment, GenomicRanges, DNAStringSet, or character vector. If the argument is not a sequence object, then a genome argument is also required, which should be a BSgenome object – the default is BSgenome.Hsapiens.UCSC.hg19. For more information on motifmatchr see the vignette from that package.

Using kmers

The matchKmers function in chromVAR can be used to make an annotation matrix for all kmers of a given length:

kmer_ix <- matchKmers(6, counts_filtered, genome = BSgenome.Hsapiens.UCSC.hg19)

Alternatively, if you only want an annotation matrix for a set of pre-determined kmers, you can pass matchKmers a list of those kmers:

my_kmers_ix <- matchKmers(c("AGATAA","GATAAG"), counts_filtered, 
                          genome = BSgenome.Hsapiens.UCSC.hg19)

Cis groups

Instead of getting groups of peaks based on shared motif or kmer, chromVAR can also use peak groups defined based on location. The fuction getCisGroups will annotate peaks based on chromosomal location. With default parameters, the function will take the first 25 peaks in a chromosome and consider those a group, then move 10 peaks down and group the 25 peaks together, and so on, to create overlapping sets of 25 peaks. The group size and step size can both be adjusted through the paremeters grpsize and stepsize.

cis_ix <- getCisGroups(counts_filtered, grpsize = 25, stepsize = 10) 

Alternative annotations

Reading in annotations from bed files

If your have a set of genomic annotations (e.g. genomic motif matches, GWAS hits) in bed files that you want to use, then the function getAnnotations can be used to read those annotations into the appropriate SummarizedExperiment with matrix.

my_annotation_files <- c(system.file("extdata/test_anno1.bed", 
                                     package = "chromVAR"),
                         system.file("extdata/test_anno1.bed", 
                                     package = "chromVAR"))
anno_ix <- getAnnotations(my_annotation_files, 
                          rowRanges = rowRanges(counts_filtered))

Reading in annotations from single bed file with column indicating specific annotation.

Use the getAnnotations with a single file and the column argument to get an annotation matrix based on the groups indicated in that column (1-based indexing).

my_annotation_file <- system.file("extdata/test_anno3.bed", 
                                     package = "chromVAR")
anno_ix <- getAnnotations(my_annotation_file, 
                           rowRanges = rowRanges(counts_filtered), column = 4)

Other annotation formats

If you have already read in your annotations into R, you can also use the getAnnotations function to get them into the right format.

Annotations GenomicRangesList:

If your annotations are stored as a GenomicRangesList, with each element of the list a GenomicRanges corresponding to the locations of the annotation, then the getAnnotations method can determine what peaks overlap with what annotations and create the appropriate matrix stored within the annotationMatches assay of a SummarizedExperiment.

library(GenomicRanges)
my_annotation_granges <- GRangesList(GRanges("chr1", 
                                             ranges = IRanges(start = c(566763,805090), 
                                                              width = 8)),
                                        GRanges("chr1", 
                                                ranges = IRanges(start = c(566792,895798), 
                                                                 width = 8)))
anno_ix <- getAnnotations(my_annotation_granges, 
                          rowRanges = rowRanges(counts_filtered))

Annotations as a matrix, Matrix, data.frame, or DataFrame

If your annotations already exist as a matrix-like object indicating what peak contains what annotation, then the getAnnotations method is simply a wrapper around SummarizedExperiment that ensures that the annotation gets stored in the annotationMatches assay.

my_annotation_df <- data.frame(first100 = c(rep(TRUE, 100),rep(FALSE,nrow(counts_filtered) - 100)),
                             last100 = c(rep(FALSE,nrow(counts_filtered) - 100),rep(TRUE, 100)))

anno_ix <- getAnnotations(my_annotation_df, 
                          rowRanges = rowRanges(counts_filtered))

Annotations as a list of peak indices

If your annotations are stored as a list of peak indices, with each element of the list a vector of which peaks contain the annotation, then the getAnnotations function can be used to convert that list into a boolean matrix of peak and annotation overlaps stored within the annotationMatches assay of a SummarizedExperiment.

my_annotation_list <- list(first100 = rep(1, 100), 
                           last100 = c(rep(nrow(counts_filtered) - 100, 
                                           nrow(counts_filtered))))
anno_ix <- getAnnotations(my_annotation_list, 
                          rowRanges = rowRanges(counts_filtered))

Alternatively, the list of peak indices can also be used as input directly into computeDeviations.

Session Info

Sys.Date()
## [1] "2017-11-10"
sessionInfo()
## R Under development (unstable) (2017-11-10 r73707)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## Matrix products: default
## BLAS: /home/travis/R-bin/lib/R/lib/libRblas.so
## LAPACK: /home/travis/R-bin/lib/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    methods   stats     graphics  grDevices utils    
## [8] datasets  base     
## 
## other attached packages:
##  [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.46.0                  
##  [3] rtracklayer_1.38.0                Biostrings_2.46.0                
##  [5] XVector_0.18.0                    chromVARmotifs_0.1.0             
##  [7] BiocParallel_1.12.0               Matrix_1.2-11                    
##  [9] SummarizedExperiment_1.8.0        DelayedArray_0.4.1               
## [11] matrixStats_0.52.2                Biobase_2.38.0                   
## [13] GenomicRanges_1.30.0              GenomeInfoDb_1.14.0              
## [15] IRanges_2.12.0                    S4Vectors_0.16.0                 
## [17] BiocGenerics_0.24.0               motifmatchr_1.1.1                
## [19] chromVAR_1.1.1                   
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6                DirichletMultinomial_1.20.0
##  [3] devtools_1.13.4             TFBSTools_1.16.0           
##  [5] bit64_0.9-7                 httr_1.3.1                 
##  [7] rprojroot_1.2               tools_3.5.0                
##  [9] backports_1.1.1             R6_2.2.2                   
## [11] DT_0.2                      seqLogo_1.44.0             
## [13] DBI_0.7                     lazyeval_0.2.0             
## [15] colorspace_1.3-2            withr_2.0.0                
## [17] curl_2.8.1                  git2r_0.19.0               
## [19] bit_1.1-12                  compiler_3.5.0             
## [21] plotly_4.7.1                caTools_1.17.1             
## [23] scales_0.5.0                readr_1.1.1                
## [25] stringr_1.2.0               digest_0.6.12              
## [27] Rsamtools_1.30.0            rmarkdown_1.7              
## [29] R.utils_2.6.0               pkgconfig_2.0.1            
## [31] htmltools_0.3.6             htmlwidgets_0.9            
## [33] rlang_0.1.2.9000            RSQLite_2.0                
## [35] VGAM_1.0-4                  BiocInstaller_1.29.1       
## [37] shiny_1.0.5                 bindr_0.1                  
## [39] jsonlite_1.5                gtools_3.5.0               
## [41] dplyr_0.7.3                 R.oo_1.21.0                
## [43] RCurl_1.95-4.8              magrittr_1.5               
## [45] GO.db_3.5.0                 GenomeInfoDbData_0.99.1    
## [47] Rcpp_0.12.13                munsell_0.4.3              
## [49] R.methodsS3_1.7.1           stringi_1.1.5              
## [51] yaml_2.1.14                 zlibbioc_1.24.0            
## [53] plyr_1.8.4                  grid_3.5.0                 
## [55] blob_1.1.0                  miniUI_0.1.1               
## [57] CNEr_1.14.0                 lattice_0.20-35            
## [59] splines_3.5.0               annotate_1.56.0            
## [61] hms_0.3                     KEGGREST_1.18.0            
## [63] knitr_1.17                  JASPAR2016_1.6.0           
## [65] reshape2_1.4.2              TFMPvalue_0.0.6            
## [67] XML_3.98-1.9                glue_1.1.1                 
## [69] evaluate_0.10.1             data.table_1.10.4          
## [71] png_0.1-7                   httpuv_1.3.5               
## [73] tidyr_0.7.1                 purrr_0.2.3                
## [75] gtable_0.2.0                poweRlaw_0.70.1            
## [77] assertthat_0.2.0            ggplot2_2.2.1              
## [79] mime_0.5                    xtable_1.8-2               
## [81] viridisLite_0.2.0           tibble_1.3.4               
## [83] GenomicAlignments_1.14.0    AnnotationDbi_1.40.0       
## [85] memoise_1.1.0               bindrcpp_0.2