Single Cell Analysis

Single-cell RNA sequencing enables cataloging and studying cellular identities at a scale and resolution unmatched by bulk sequencing.

Single-cell RNA sequencing (scRNA-seq) is one of the most rapidly advancing and diversifying technologies in molecular biology. The ability to study gene expression on the resolution of single cells has been as transformative as the advent of bulk RNA-sequencing previously.

In addition to single-cell RNA-seq, a number of other next-generation sequencing (NGS) -based assays have been adapted to single-cell protocols. These include genomic, proteomic and epigenetic assays, notably single-cell ATAC-sequencing, which is commonly performed in conjunction with scRNA-seq.

Please enable JavaScript in your browser to complete this form.

Quality control and preprocessing

Like with any NGS data, the analysis of single-cell sequencing data starts with quality control and preprocessing.

Raw sequencing reads are quality-tested and metrics such as cell quality, accuracy, and diversity are generated. Reads are then aligned to an applicable reference genome or transcriptome, and additional metrics such as the number of cells, reads per cell, genes per cell, sequencing saturation and fraction of mitochondrial transcripts are plotted and inspected.

These QC metrics inform us about the total quality of the libraries and the usability of the samples and enable identifying and removing low-quality cells.

Exploratory analysis

Preprocessed single-cell RNA-seq data is clustered to identify groups of similar cells and visualized using non-linear dimensionality reduction algorithms such tSNE and UMAP and correlation heatmaps to unveil general patterns of cell heterogeneity.

These visualisations help us answer technical questions such as:

  • Do the biological replicates resemble each other?
  • Are there outlier samples or cells?
  • Are the cell clusters distinct?

…and biological questions such as:

  • How heterogeneous are the underlying cell types/states?
  • Do distinct samples (e.g., different tissues, treatments or time points) form separate clusters?

Cell type identification

Identifying and characterizing cell types (and more refined cell states) is the most central part of most single-cell projects.

It all starts with identifying features (e.g., genes, proteins, accessible regions) that are specific to each cell cluster. These markers are defined by differential expression (DE) comparison of each cell cluster and the remaining ones, yielding DE statistics such as fold change and statistical significance.

The cluster markers can be visualized using scatter plots, violin plots, and heatmaps.

Markers are further annotated to biologically meaningful terms, such as a biological processes, signaling pathways or a specific disease. Such analyses may rely either on over-representation analysis or gene set enrichment analysis, which both result in a list of enriched gene sets with relevant statistics and annotations.

Single-cell datasets are typically also integrated with publicly available data in order to exploit the cell-type information from already annotated datasets or cell atlases. This enables transferring cell labels into the analyzed dataset.

Trajectory analysis

In addition to characterizing distinct cellular identities, single-cell data lends itself to identifying continuums of gradual change in cell state, or trajectories. Uncovering such continuums is also called pseudotime analysis — while all cells are sampled at the same time point, individual cells may represent different stages in a temporal process such as differentiation.

De novo reconstruction of lineage differentiation and cell maturation trajectories allow exploring cellular dynamics, delineation of cell developmental lineages, and characterization of transition between cell states along a latent pseudotime dimension.

An ensemble of trajectory inference algorithms may be used for robust identification of root and terminal cellular states, branching points, and lineages. Single cells are ranked across deterministic or probabilistic lineages, and their ranking indicates their progression in a dynamic process of interest.

Integrating single-cell RNA-seq and epigenomics

Integrating single-cell RNA-seq data with single-cell ATAC-seq or single-cell methylation data often relies on matched cells as anchors (when the measurements derive from the same cells as in, e.g., 10X Genomics Multiome technology).

Combining expression data with chromatin accessibility or methylation profiles enables more robust identification of cell types and allows for quantifying the effect of chromatin state to expression in individual cell types.

Spatial transcriptomic analysis

Spatially resolved single-cell transcriptomic assays couple expression data with the cell’s positional context in a tissue or organ. This is particularly useful in the study of complex solid tissues, such as tumors and their microenvironment.

Spatial transcriptomic analysis involves cell/spot clustering in space, identification of spatially variable genes and resolving cell types in space.