Perform iterative LSI clustering and dimension reduction as described in doi:10.1038/s41587-019-0332-7

# S4 method for DsATACsc
  it0regionType = "t5k",
  it0nMostAcc = 20000L,
  it0pcs = 1:25,
  it0clusterResolution = 0.8,
  it0clusterMinCells = 200L,
  it0nTopPeaksPerCluster = 2e+05,
  it1pcs = 1:50,
  it1clusterResolution = 0.8,
  it1mostVarPeaks = 50000L,
  it2pcs = 1:50,
  it2clusterResolution = 0.8,
  rmDepthCor = 0.5,
  normPcs = FALSE,
  umapParams = list(distMethod = "euclidean", min_dist = 0.5, n_neighbors = 25)



DsATACsc object


character string specifying the region type to start with


the number of the most accessible regions to consider in iteration 0


the principal components to consider in iteration 0


resolution paramter for Seurat's clustering (Seurat::FindClusters) in iteration 0


the minimum number of cells in a cluster in order for it to be considered in peak calling (iteration 0)


the number of best peaks to be considered for each cluster in the merged peak set (iteration 0)


the principal components to consider in iteration 0


resolution paramter for Seurat's clustering (Seurat::FindClusters) in iteration 1


the number of the most variable peaks to consider after iteration 1


the principal components to consider in the final iteration (2)


resolution paramter for Seurat's clustering (Seurat::FindClusters) in the final iteration (2)


correlation cutoff to be used to discard principal components associated with fragment depth (all iterationa)


flag indicating whether to apply z-score normalization to PCs for each cell (all iterations)


parameters to compute UMAP coordinates (passed on to muRtools::getDimRedCoords.umap and further to uwot::umap)


an S3 object containing dimensionality reduction results, peak sets and clustering


In order to obtain a low dimensional representation of single-cell ATAC datasets in terms of principal components and UMAP coordinates, we recommend an iterative application of the Latent Semantic Indexing approach [10.1016/j.cell.2018.06.052] described in [doi:10.1038/s41587-019-0332-7]. This approach also identifies cell clusters and a peak set that represents a consensus peak set of cluster peaks in a given dataset. In brief, in an initial iteration clusters are identified based on the most accessible regions (e.g. genomic tiling regions). Here, the counts are first normalized using the term frequency–inverse document frequency (TF-IDF) transformation and singular values are computed based on these normalized counts in selected regions (i.e. the most accessible regions in the initial iteration). Clusters are identified based on the singular values using Louvain clustering (as implemented in the Seurat package). Peak calling is then performed on the aggregated insertion sites from all cells of each cluster (using MACS2) and a union/consensus set of peaks uniform-length non-overlapping peaks is selected. In a second iteration, the peak regions whose TF-IDF-normalized counts which exhibit the most variability across the initial clusters provide the basis for a refined clustering using derived singular values. In the final iteration, the most variable peaks across the refined clusters are identified as the final peak set and singular values are computed again. Based on these final singular values UMAP coordinates are computed for low-dimensional projection.

The output object includes the final singular values/principal components (result$pcaCoord), the low-dimensional coordinates (result$umapCoord), the final cluster assignment of all cells (result$clustAss), the complete, unfiltered initial cluster peak set (result$clusterPeaks_unfiltered) as well as the final cluster-variable peak set (result$regionGr).


Fabian Mueller