4.2 Statistical analysis of bio-molecular data and combinatorial difficulties : two examples (Stéphane Robin)

51:37
 
Partager
 

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on April 19, 2019 09:37 (2y ago)

What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Manage episode 188707053 series 1600644
Par Universite Paris 1 Pantheon-Sorbonne, découvert par Player FM et notre communauté - Le copyright est détenu par l'éditeur, non par Player F, et l'audio est diffusé directement depuis ses serveurs. Appuyiez sur le bouton S'Abonner pour suivre les mises à jour sur Player FM, ou collez l'URL du flux dans d'autre applications de podcasts.
Combinatorial issues are often raised by statistical model inference and selection, in particular when dealing with high-dimensional data. In such cases, asymptotic approximations or Monte-Carlo type methods are often used to approximate the quantities of interest. In this talk, we will present two examples dealing with bio-molecular data. In both of them exacts results can be obtained based on specific combinatorics and algorithmics developments. We will first consider the typical multiple testing issued that is faced when dealing with high-throughput data. In this framework, most multiple testing procedures require a precise estimation of the proportion of true null hypotheses. This estimation problem can be rephrased as an histogram selection problem, which can be solved via leave-p-out (LpO) cross-validation. We will present explicit results that allow us to manage this model selection problem, avoiding the computational burden inherent to LpO. We will then consider a segmentation problem encountered when looking for chromosomal aberrations based one microarray data. The detection of breakpoints and the estimation of their number is an old statistical problem. As for the precision of their localisation, only asymptotic results are available. We will present a dynamic programming type algorithm that allows us to explore the whole segmentation space. It provides information on the localisation precision. It furthermore provides a new model selection criterion for the number of breakpoints.

12 episodes