Methods for integrated analysis of multiple omics datasets


Statistical methods are developed for the integrated analysis of Human Microbiome data, metabolomics, glycomics, proteomics and genomic datasets. Research topics are joint modeling of multiple variables as alternative for ad hoc defined combinations of multiple phenotypes, methods for secondary phenotypes to take into account study design, network analysis for dimension reduction and visualisation and meta analysis of results from omics datasets with special interest in heterogeneity across platforms, populations and study designs.

The MIMOmics project has received funding from the European Union’s Seventh Framework Programme (FP7-Health-F5-2012) under grant agreement n° 305280. Its main objective is statistical data integration. Simultaneous integration of omics data is essential for understanding the underlying biological system, both qualitatively as quantitatively.
Within MIMOmics we are involved in three work packages (WPs) for statistical data integration:

For WP1 – Data Harmonization, we develop a data integration method, in which probabilistic method of embedding a high dimensional dataset in terms of low dimensional ‘latent’ variables. This method can be applied to integrate the same omic datasets measured by different technologies such as LC-MS and NMR, and to integrate different omics datasets such as metabolomics and glycomics.

For WP2 – Network analysis, we combine correlated omics data sets using (Weighted) Multiplex Networks to capture the underlying patterns. We also investigate how to identify modules or sub-networks that are enriched with multi-omics information.

For WP3 – Risk prediction, we develop method to determine the augmented value of an omics data set and we study prediction models which are biologically interpretable by using network information.

For WP4 – Meta Analysis, we develop methods for combining multilevel biomarkers across studies, called Super-Meta. Joining forces of the experts from the different scientific fields, we aim to combine profiles representing the same mechanism but based on various omics variables. In particular these sub-projects are performed in collaboration with WUR and  TUDelft.

Statistical methods for family data

This was the topic of my PhD thesis. The analysis of family data is challenging due to the correlation structure and the outcome dependent sampling to enrich for genetic variants. A lot of my work has been focussed on testing for the presence of genetic effects in families. The last decade I am more interested in modelling the relationship between genetic effects and disease and health outcomes. In addition to genetic factors, I also model the contribution of life style factors and of the interplay between genetic and life style factors to aggregation of diseases in families.

I am involved in the following family studies in Leiden:

Research profile: Health, Prevention and the Human Life cycle
Study populations: Leiden Longevity Study

While family data contains information on risk factors for disease, this information is not used for risk prediction. I develop methods for risk prediction based on family data. This topic is closely related with the STW project TOPBREED of Prof Fred van Eeuwijk (WUR) in which I am involved. Here we develop methods to increase genomic breeding values.

Statistical methods for modelling the effect of helminth infections on health

For data from a household randomized clinical trial performed in Nangapanda Indonesia, I develop statistical methods for the analysis of categorical count data (human microbiome) in clustered designs (longitudinal, families, households) and for the joint analysis of several (mixed) outcomes over time.