`iterativeLSI-DsATACsc-method.Rd`

Perform iterative LSI clustering and dimension reduction as described in doi:10.1038/s41587-019-0332-7

```
# S4 method for DsATACsc
iterativeLSI(
.object,
it0regionType = "t5k",
it0nMostAcc = 20000L,
it0pcs = 1:25,
it0clusterResolution = 0.8,
it0clusterMinCells = 200L,
it0nTopPeaksPerCluster = 2e+05,
it1pcs = 1:50,
it1clusterResolution = 0.8,
it1mostVarPeaks = 50000L,
it2pcs = 1:50,
it2clusterResolution = 0.8,
rmDepthCor = 0.5,
normPcs = FALSE,
umapParams = list(distMethod = "euclidean", min_dist = 0.5, n_neighbors = 25)
)
```

- .object
`DsATACsc`

object- it0regionType
character string specifying the region type to start with

- it0nMostAcc
the number of the most accessible regions to consider in iteration 0

- it0pcs
the principal components to consider in iteration 0

- it0clusterResolution
resolution paramter for Seurat's clustering (

`Seurat::FindClusters`

) in iteration 0- it0clusterMinCells
the minimum number of cells in a cluster in order for it to be considered in peak calling (iteration 0)

- it0nTopPeaksPerCluster
the number of best peaks to be considered for each cluster in the merged peak set (iteration 0)

- it1pcs
the principal components to consider in iteration 0

- it1clusterResolution
resolution paramter for Seurat's clustering (

`Seurat::FindClusters`

) in iteration 1- it1mostVarPeaks
the number of the most variable peaks to consider after iteration 1

- it2pcs
the principal components to consider in the final iteration (2)

- it2clusterResolution
resolution paramter for Seurat's clustering (

`Seurat::FindClusters`

) in the final iteration (2)- rmDepthCor
correlation cutoff to be used to discard principal components associated with fragment depth (all iterationa)

- normPcs
flag indicating whether to apply z-score normalization to PCs for each cell (all iterations)

- umapParams
parameters to compute UMAP coordinates (passed on to

`muRtools::getDimRedCoords.umap`

and further to`uwot::umap`

)

an `S3`

object containing dimensionality reduction results, peak sets and clustering

In order to obtain a low dimensional representation of single-cell ATAC datasets in terms of principal components and UMAP coordinates, we recommend an iterative application of the Latent Semantic Indexing approach [10.1016/j.cell.2018.06.052] described in [doi:10.1038/s41587-019-0332-7]. This approach also identifies cell clusters and a peak set that represents a consensus peak set of cluster peaks in a given dataset. In brief, in an initial iteration clusters are identified based on the most accessible regions (e.g. genomic tiling regions). Here, the counts are first normalized using the term frequency–inverse document frequency (TF-IDF) transformation and singular values are computed based on these normalized counts in selected regions (i.e. the most accessible regions in the initial iteration). Clusters are identified based on the singular values using Louvain clustering (as implemented in the `Seurat`

package). Peak calling is then performed on the aggregated insertion sites from all cells of each cluster (using MACS2) and a union/consensus set of peaks uniform-length non-overlapping peaks is selected. In a second iteration, the peak regions whose TF-IDF-normalized counts which exhibit the most variability across the initial clusters provide the basis for a refined clustering using derived singular values. In the final iteration, the most variable peaks across the refined clusters are identified as the final peak set and singular values are computed again. Based on these final singular values UMAP coordinates are computed for low-dimensional projection.

The output object includes the final singular values/principal components (`result$pcaCoord`

), the low-dimensional coordinates (`result$umapCoord`

), the final cluster assignment of all cells (`result$clustAss`

), the complete, unfiltered initial cluster peak set (`result$clusterPeaks_unfiltered`

) as well as the final cluster-variable peak set (`result$regionGr`

).