Perform iterative LSI clustering and dimension reduction as described in doi:10.1038/s41587-019-0332-7

# S4 method for DsATACsc
iterativeLSI(
  .object,
  it0regionType = "t5k",
  it0nMostAcc = 20000L,
  it0pcs = 1:25,
  it0clusterResolution = 0.8,
  it0clusterMinCells = 200L,
  it0nTopPeaksPerCluster = 2e+05,
  it1pcs = 1:50,
  it1clusterResolution = 0.8,
  it1mostVarPeaks = 50000L,
  it2pcs = 1:50,
  it2clusterResolution = 0.8,
  rmDepthCor = 0.5,
  normPcs = FALSE,
  umapParams = list(distMethod = "euclidean", min_dist = 0.5, n_neighbors = 25)
)

Arguments

.object

DsATACsc object

it0regionType

character string specifying the region type to start with

it0nMostAcc

the number of the most accessible regions to consider in iteration 0

it0pcs

the principal components to consider in iteration 0

it0clusterResolution

resolution paramter for Seurat's clustering (Seurat::FindClusters) in iteration 0

it0clusterMinCells

the minimum number of cells in a cluster in order for it to be considered in peak calling (iteration 0)

it0nTopPeaksPerCluster

the number of best peaks to be considered for each cluster in the merged peak set (iteration 0)

it1pcs

the principal components to consider in iteration 0

it1clusterResolution

resolution paramter for Seurat's clustering (Seurat::FindClusters) in iteration 1

it1mostVarPeaks

the number of the most variable peaks to consider after iteration 1

it2pcs

the principal components to consider in the final iteration (2)

it2clusterResolution

resolution paramter for Seurat's clustering (Seurat::FindClusters) in the final iteration (2)

rmDepthCor

correlation cutoff to be used to discard principal components associated with fragment depth (all iterationa)

normPcs

flag indicating whether to apply z-score normalization to PCs for each cell (all iterations)

umapParams

parameters to compute UMAP coordinates (passed on to muRtools::getDimRedCoords.umap and further to uwot::umap)

Value

an S3 object containing dimensionality reduction results, peak sets and clustering

Details

In order to obtain a low dimensional representation of single-cell ATAC datasets in terms of principal components and UMAP coordinates, we recommend an iterative application of the Latent Semantic Indexing approach [10.1016/j.cell.2018.06.052] described in [doi:10.1038/s41587-019-0332-7]. This approach also identifies cell clusters and a peak set that represents a consensus peak set of cluster peaks in a given dataset. In brief, in an initial iteration clusters are identified based on the most accessible regions (e.g. genomic tiling regions). Here, the counts are first normalized using the term frequency–inverse document frequency (TF-IDF) transformation and singular values are computed based on these normalized counts in selected regions (i.e. the most accessible regions in the initial iteration). Clusters are identified based on the singular values using Louvain clustering (as implemented in the Seurat package). Peak calling is then performed on the aggregated insertion sites from all cells of each cluster (using MACS2) and a union/consensus set of peaks uniform-length non-overlapping peaks is selected. In a second iteration, the peak regions whose TF-IDF-normalized counts which exhibit the most variability across the initial clusters provide the basis for a refined clustering using derived singular values. In the final iteration, the most variable peaks across the refined clusters are identified as the final peak set and singular values are computed again. Based on these final singular values UMAP coordinates are computed for low-dimensional projection.

The output object includes the final singular values/principal components (result$pcaCoord), the low-dimensional coordinates (result$umapCoord), the final cluster assignment of all cells (result$clustAss), the complete, unfiltered initial cluster peak set (result$clusterPeaks_unfiltered) as well as the final cluster-variable peak set (result$regionGr).

Author

Fabian Mueller