Measuring local dimensionality of single cell transcriptomics

Measuring local dimensionality of single cell transcriptomics

By In Aiv Internship On May 31, 2018

Internship title: Measuring local dimensionality of single cell transcriptomics

Name: Computational Systems Biology of Cancer
Affiliation: Institut Curie
Address: 26 rue d’Ulm, 75005, Paris, France

LAB Director
Name: Emmanuel Barillot
Phone number: +33156246980

Name: Luca Albergante
Phone number: +33156246931

Subject Keywords: single cell transcriptomics
machine learning
principal graphs
Tools and methodologies: Scripting in a modern data-oriented programming language such as R, MATLAB, or python

Statistical analysis of structured highly-dimensional transcriptomics data

Inference of principal curve data approximators via ElPiGraph (implementation of the algorithm are available in different programming languages)

Summary of lab’s interests: Computational Systems Biology of Cancer Group at the Institut Curie is interested in the development of computational approaches capable of identified key features of biological datasets, in the construction of mathematical model designed to understand complex biological processes, and in the formalization of biological knowledge into interactive, publicly available, network maps. These interests are achieved in a collaborative context that promotes the interaction of researchers from different disciplines (computer science, mathematics, biology).
Project summary: It is now possible to measure the expression of the whole genome at the single cell level across thousands of cells. By applying different machine learning approaches to these data, researchers have been able to uncover complex trajectories that can be associated with the genetic changes underwent by cells changing from one state to another (e.g., stem cells differentiating to specific cellular population, or immune cells getting activated). This approach, which is often names “pseudotime analysis”, is gaining an increasing popularity. However, several open question remains.

The project will be focused on exploring the concept of “local dimensionality”, broadly defined as the number of dimensions needed to effectively capture the variance in a given subset of the data. In particular, the student will determine the most appropriate measure of local dimensionality in the contest of single cell transcriptomics and explore how local dimensionality changes across reconstructed trajectories.

This information will then be used to assess if any relation can be found between different aspects of biological processes and the local dimensionality of the data. Furthermore, the student will study the extent to which local dimensionality can be used as a proxy to measure the quality of the reconstructed trajectory.
Interdisciplinary aspect of the project: Single cell data analysis is a relative young field which relies on a multitude of machine learning techniques to extract relevant biological information for the data. In particular, the proposed project will allow the student to comprehend the peculiarities of single cell transcriptomics data and to adapt different mathematical and statistical concepts in such a way to maximize their relevance to biology.

In doing so, the student will acquire expertise in the domains of both biology and machine learning, and will be able to inform both biology and statistics on the potential role of local dimensionality to explore complex processes.