BIDS Data Science Lecture Series | December 4, 2015 | 1:00-2:30 p.m. | 190 Doe Library, UC Berkeley
Sponsors: Berkeley Institute for Data Science and the Data, Society and Inference Seminar
Single-cell transcriptome sequencing (scRNA-Seq), which combines high-throughput single-cell extraction and sequencing capabilities, enables the transcriptome of large numbers of individual cells to be assayed efficiently. Profiling of gene expression at the single-cell level is crucial for addressing many biologically relevant questions, such as the investigation of rare cell types or primary cells (e.g., early development, where each of a small number of cells may have a distinct function) and the identification of subpopulations of cells from a larger heterogeneous population (e.g., discovering cell types in brain tissues). scRNA-Seq assays generate large datasets and involve inference for high-dimensional multivariate distributions with complex and unknown dependence structures among variables.
I will discuss some of the statistical analysis issues that have arisen in the context of a collaboration funded by the Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative, with the aim of classifying neuronal cells in the mouse somatosensory cortex. These issues, ranging from so-called low-level to high-level analyses, include exploratory data analysis (EDA) for quality assessment/control (QA/QC) of scRNA-Seq reads, normalization to account for nuisance technical effects, cluster analysis to identify novel cell types, and differential expression analysis to derive gene expression signatures for the cell types.