4.2 Statistical analysis of bio-molecular data and combinatorial difficulties : two examples (Stéphane Robin)

Contenu fourni par Universite Paris 1 Pantheon-Sorbonne. Tout le contenu du podcast, y compris les épisodes, les graphiques et les descriptions de podcast, est téléchargé et fourni directement par Universite Paris 1 Pantheon-Sorbonne ou son partenaire de plateforme de podcast. Si vous pensez que quelqu'un utilise votre œuvre protégée sans votre autorisation, vous pouvez suivre le processus décrit ici https://fr.player.fm/legal.

StatLearn 2010 - Workshop on "Challenging problems in Statistical Learning"
4.2 Statistical analysis of bio-molecular data and combinatorial difficulties : two examples (Stéphane Robin)

9+ y ago 51:37

MP4•Maison d'episode

Série archivée ("Flux inactif" status)

When? This feed was archived on June 29, 2023 09:11 (10M ago). Last successful fetch was on August 01, 2022 18:06 (1+ y ago)

Why? Flux inactif status. Nos serveurs ont été incapables de récupérer un flux de podcast valide pour une période prolongée.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Combinatorial issues are often raised by statistical model inference and selection, in particular when dealing with high-dimensional data. In such cases, asymptotic approximations or Monte-Carlo type methods are often used to approximate the quantities of interest. In this talk, we will present two examples dealing with bio-molecular data. In both of them exacts results can be obtained based on specific combinatorics and algorithmics developments. We will first consider the typical multiple testing issued that is faced when dealing with high-throughput data. In this framework, most multiple testing procedures require a precise estimation of the proportion of true null hypotheses. This estimation problem can be rephrased as an histogram selection problem, which can be solved via leave-p-out (LpO) cross-validation. We will present explicit results that allow us to manage this model selection problem, avoiding the computational burden inherent to LpO. We will then consider a segmentation problem encountered when looking for chromosomal aberrations based one microarray data. The detection of breakpoints and the estimation of their number is an old statistical problem. As for the precision of their localisation, only asymptotic results are available. We will present a dynamic programming type algorithm that allows us to explore the whole segmentation space. It provides information on the localisation precision. It furthermore provides a new model selection criterion for the number of breakpoints.

12 episodes

#Éducation #Universite Paris 1 Pantheon-Sorbonne #Vidéo #Enseignement Supérieur