The main computeDeviations
function from chromVAR requires an object storing what peaks overlap what motifs or other annotations. The package includes functions for creating such an object from a set of motifs or kmers, or for converting an existing matrix, data.frame, list, GenomicRangesList, or list of bed files of annotations into the appropriate object.
library(chromVAR)
library(motifmatchr)
library(SummarizedExperiment)
library(Matrix)
library(BiocParallel)
register(SerialParam())
The most common type of annotations to use for chromVAR are motifs. chromVAR has a function to make it easy to read in motifs from the JASPAR database:
jaspar_motifs <- getJasparMotifs() # default species is human
A companion package, chromVARmotifs, includes pwms from a couple different sources that can also be used with motifmatchr and chromVAR.
devtools::install_github("GreenleafLab/chromVARmotifs")
## Using GitHub PAT from envvar GITHUB_PAT
## Downloading GitHub repo GreenleafLab/chromVARmotifs@master
## from URL https://api.github.com/repos/GreenleafLab/chromVARmotifs/zipball/master
## Installing chromVARmotifs
## '/home/travis/R-bin/lib/R/bin/R' --no-site-file --no-environ --no-save \
## --no-restore --quiet CMD INSTALL \
## '/tmp/RtmpMNe5Ax/devtools411e75a49d3d/GreenleafLab-chromVARmotifs-75af948' \
## --library='/home/travis/R/Library' --install-tests
##
We can load these collections using data
.
library(chromVARmotifs)
data("human_pwms_v1") # human collection
data("mouse_pwms_v1") # mouse collection
data("homer_pwms")
data("encode_pwms")
If using your own motifs, they need to be formatted as either a PWMatrixList or a PFMatrixList from the TFBSTools package. See documentation from TFBStools for those two objects.
The package motifmatchr can be used to find motifs within peaks. The matchMotifs
method takes as inputs the motif lists described above, or your own list of motifs, and returns a (Ranged)SummarizedExperiment with a matrix indicating what peaks (rows) contain what motifs (columns).
library(BSgenome.Hsapiens.UCSC.hg19)
## Loading required package: BSgenome
## Loading required package: Biostrings
## Loading required package: XVector
##
## Attaching package: 'Biostrings'
## The following object is masked from 'package:DelayedArray':
##
## type
## The following object is masked from 'package:base':
##
## strsplit
## Loading required package: rtracklayer
# First get filtered counts
data(example_counts, package = "chromVAR")
example_counts <- addGCBias(example_counts, genome = BSgenome.Hsapiens.UCSC.hg19)
counts_filtered <- filterSamples(example_counts, min_depth = 1500, min_in_peaks = 0.15, shiny = FALSE)
counts_filtered <- filterPeaks(counts_filtered)
# get motif matches
motif_ix <- matchMotifs(jaspar_motifs, counts_filtered, genome = BSgenome.Hsapiens.UCSC.hg19)
matchMotifs
can accept as the second argument either a RangedSummarizedExperiment, GenomicRanges, DNAStringSet, or character vector. If the argument is not a sequence object, then a genome argument is also required, which should be a BSgenome object – the default is BSgenome.Hsapiens.UCSC.hg19. For more information on motifmatchr see the vignette from that package.
The matchKmers
function in chromVAR can be used to make an annotation matrix for all kmers of a given length:
kmer_ix <- matchKmers(6, counts_filtered, genome = BSgenome.Hsapiens.UCSC.hg19)
Alternatively, if you only want an annotation matrix for a set of pre-determined kmers, you can pass matchKmers
a list of those kmers:
my_kmers_ix <- matchKmers(c("AGATAA","GATAAG"), counts_filtered,
genome = BSgenome.Hsapiens.UCSC.hg19)
Instead of getting groups of peaks based on shared motif or kmer, chromVAR can also use peak groups defined based on location. The fuction getCisGroups will annotate peaks based on chromosomal location. With default parameters, the function will take the first 25 peaks in a chromosome and consider those a group, then move 10 peaks down and group the 25 peaks together, and so on, to create overlapping sets of 25 peaks. The group size and step size can both be adjusted through the paremeters grpsize and stepsize.
cis_ix <- getCisGroups(counts_filtered, grpsize = 25, stepsize = 10)
If your have a set of genomic annotations (e.g. genomic motif matches, GWAS hits) in bed files that you want to use, then the function getAnnotations
can be used to read those annotations into the appropriate SummarizedExperiment with matrix.
my_annotation_files <- c(system.file("extdata/test_anno1.bed",
package = "chromVAR"),
system.file("extdata/test_anno1.bed",
package = "chromVAR"))
anno_ix <- getAnnotations(my_annotation_files,
rowRanges = rowRanges(counts_filtered))
Use the getAnnotations
with a single file and the column argument to get an annotation matrix based on the groups indicated in that column (1-based indexing).
my_annotation_file <- system.file("extdata/test_anno3.bed",
package = "chromVAR")
anno_ix <- getAnnotations(my_annotation_file,
rowRanges = rowRanges(counts_filtered), column = 4)
If you have already read in your annotations into R, you can also use the getAnnotations
function to get them into the right format.
If your annotations are stored as a GenomicRangesList, with each element of the list a GenomicRanges corresponding to the locations of the annotation, then the getAnnotations
method can determine what peaks overlap with what annotations and create the appropriate matrix stored within the annotationMatches
assay of a SummarizedExperiment.
library(GenomicRanges)
my_annotation_granges <- GRangesList(GRanges("chr1",
ranges = IRanges(start = c(566763,805090),
width = 8)),
GRanges("chr1",
ranges = IRanges(start = c(566792,895798),
width = 8)))
anno_ix <- getAnnotations(my_annotation_granges,
rowRanges = rowRanges(counts_filtered))
If your annotations already exist as a matrix-like object indicating what peak contains what annotation, then the getAnnotations
method is simply a wrapper around SummarizedExperiment that ensures that the annotation gets stored in the annotationMatches
assay.
my_annotation_df <- data.frame(first100 = c(rep(TRUE, 100),rep(FALSE,nrow(counts_filtered) - 100)),
last100 = c(rep(FALSE,nrow(counts_filtered) - 100),rep(TRUE, 100)))
anno_ix <- getAnnotations(my_annotation_df,
rowRanges = rowRanges(counts_filtered))
If your annotations are stored as a list of peak indices, with each element of the list a vector of which peaks contain the annotation, then the getAnnotations
function can be used to convert that list into a boolean matrix of peak and annotation overlaps stored within the annotationMatches
assay of a SummarizedExperiment.
my_annotation_list <- list(first100 = rep(1, 100),
last100 = c(rep(nrow(counts_filtered) - 100,
nrow(counts_filtered))))
anno_ix <- getAnnotations(my_annotation_list,
rowRanges = rowRanges(counts_filtered))
Alternatively, the list of peak indices can also be used as input directly into computeDeviations
.
Sys.Date()
## [1] "2017-11-10"
sessionInfo()
## R Under development (unstable) (2017-11-10 r73707)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
##
## Matrix products: default
## BLAS: /home/travis/R-bin/lib/R/lib/libRblas.so
## LAPACK: /home/travis/R-bin/lib/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 methods stats graphics grDevices utils
## [8] datasets base
##
## other attached packages:
## [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.46.0
## [3] rtracklayer_1.38.0 Biostrings_2.46.0
## [5] XVector_0.18.0 chromVARmotifs_0.1.0
## [7] BiocParallel_1.12.0 Matrix_1.2-11
## [9] SummarizedExperiment_1.8.0 DelayedArray_0.4.1
## [11] matrixStats_0.52.2 Biobase_2.38.0
## [13] GenomicRanges_1.30.0 GenomeInfoDb_1.14.0
## [15] IRanges_2.12.0 S4Vectors_0.16.0
## [17] BiocGenerics_0.24.0 motifmatchr_1.1.1
## [19] chromVAR_1.1.1
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-6 DirichletMultinomial_1.20.0
## [3] devtools_1.13.4 TFBSTools_1.16.0
## [5] bit64_0.9-7 httr_1.3.1
## [7] rprojroot_1.2 tools_3.5.0
## [9] backports_1.1.1 R6_2.2.2
## [11] DT_0.2 seqLogo_1.44.0
## [13] DBI_0.7 lazyeval_0.2.0
## [15] colorspace_1.3-2 withr_2.0.0
## [17] curl_2.8.1 git2r_0.19.0
## [19] bit_1.1-12 compiler_3.5.0
## [21] plotly_4.7.1 caTools_1.17.1
## [23] scales_0.5.0 readr_1.1.1
## [25] stringr_1.2.0 digest_0.6.12
## [27] Rsamtools_1.30.0 rmarkdown_1.7
## [29] R.utils_2.6.0 pkgconfig_2.0.1
## [31] htmltools_0.3.6 htmlwidgets_0.9
## [33] rlang_0.1.2.9000 RSQLite_2.0
## [35] VGAM_1.0-4 BiocInstaller_1.29.1
## [37] shiny_1.0.5 bindr_0.1
## [39] jsonlite_1.5 gtools_3.5.0
## [41] dplyr_0.7.3 R.oo_1.21.0
## [43] RCurl_1.95-4.8 magrittr_1.5
## [45] GO.db_3.5.0 GenomeInfoDbData_0.99.1
## [47] Rcpp_0.12.13 munsell_0.4.3
## [49] R.methodsS3_1.7.1 stringi_1.1.5
## [51] yaml_2.1.14 zlibbioc_1.24.0
## [53] plyr_1.8.4 grid_3.5.0
## [55] blob_1.1.0 miniUI_0.1.1
## [57] CNEr_1.14.0 lattice_0.20-35
## [59] splines_3.5.0 annotate_1.56.0
## [61] hms_0.3 KEGGREST_1.18.0
## [63] knitr_1.17 JASPAR2016_1.6.0
## [65] reshape2_1.4.2 TFMPvalue_0.0.6
## [67] XML_3.98-1.9 glue_1.1.1
## [69] evaluate_0.10.1 data.table_1.10.4
## [71] png_0.1-7 httpuv_1.3.5
## [73] tidyr_0.7.1 purrr_0.2.3
## [75] gtable_0.2.0 poweRlaw_0.70.1
## [77] assertthat_0.2.0 ggplot2_2.2.1
## [79] mime_0.5 xtable_1.8-2
## [81] viridisLite_0.2.0 tibble_1.3.4
## [83] GenomicAlignments_1.14.0 AnnotationDbi_1.40.0
## [85] memoise_1.1.0 bindrcpp_0.2