Computational BioMedicine Laboratory


Category: national

Project Title: Casual-based Variable Selection for Omics Data

Funding Organization: GSRT/National Strategic Reference Framework (NSRF) 2007-2013

Programme: ARISTEIA II

Coordinator: Foundation for Research and Technology-HEllas


Duration: 31/01/2014-31/07/2015

Expiration Date: 2015

Total Budget: 210000

FORTH ICS budget: 210000

Project Objective: Variable selection is a well-studied, solved problem in machine learning and data mining. Or not? While significant progress has been made when variable selection is employed to increase classification performance, novel and important directions need to be explored when variable selection is employed for understanding the system under study. This is particularly the case when considering the analysis of high-dimensional data, such as omics data (i.e., transcriptomics, next generation sequencing, methylation, etc.). We propose an intense research program to address current pressing needs, particularly in medicine and biology, but also in any high-dimensional data-analysis setting. We set forth new variable selection problems with deep connections to causality and causal theories. We propose to study (a) variable selection for repeated measurements, longitudinal data, (b) variable selection when trying to predict the effect of interventions, such as knocking out a gene, (c) variable selection simultaneously from several heterogeneous datasets, as well as any prior knowledge, (d) variable selection to identify all optimal variable sets, not just one; this is particularly important for low sample size and in the presence of co-linearities, and (e) variable selection for hard distributions where pair-wise associations for important variables disappear (e.g., exclusive OR functions). The algorithms will co-evolve with three important biological applications to maximize the potential impact to human health, as well as ensure they are practical and useful: (I) mesothelioma and lung cancer, (II) chronic lung diseases, and (III) DNA-damage related aging. The applications will be supervised by our national and international biology collaborators. To ensure rapid update of results we will encapsulate the algorithms in easy-to-use tools targeting non-expert users.

Conditions of Use | Privacy Policy