Author: Jeanine Houwing-Duistermaat

Sitting on the data. Why?

Why not sharing our data with other scientists? To stimulate multidisciplinary research? To accelerate scientific output by allowing more eyes to look at the data? To have fun together? In a comment which was published last week in the New England Journal of Medicine the authors recommended that data from randomized clinical trials might not be freely accessible at the first publication. They stated that a delay up to five years after the first publication should be allowed to the “data owners”. In the NRC of last Saturday, Rosanne Hertzberger writes about the lack of an open culture in science and mentions the comment in this context. Complexity of the data is the first argument against data sharing which is given in the comment. What do the “data owners” mean with complexity of the data? Complexity of study design, sampling scheme and structure? Or the underlying biology represented by the data? Or both? As a biostatistician I have often encountered these issue around data sharing. In biomedical projects where my colleagues like to add data from another study to improve the power. In my own research projects where I like to illustrate the new methods with exciting datasets. Statistical research aims development of methods which efficiently analyze the data to infer relationships between variables in the whole population from which the subjects of the study were sampled. Statistical inference might...

Read More

Dissemination of quantitative methodology: Too difficult?

As coordinator of the European consortium MIMOmics, I have to take care that scientific milestones are reached and that we deliver our products (methodology) in time. This has been part of my job for many years. However the European Union also expects dissemination at various levels. To your own scientific peers. No problem! To the general public. HELP! Dissemination to my own discipline, I have done from the beginning of my career. My PhD thesis consists of five papers published in scientific journals. Already during my PhD studies, I presented my work at international conferences. What about dissemination to other disciplines? As an applied statistician I am also used to explain my work to my collaborators from medicine, biology and social sciences. The role of quantitative methodologists in (bio)medical research is widely acknowledged. They need us. We need them. Why do I find dissemination to the public so difficult? Questions about my job, I have always answered with: “something with genetics” or “something with how we can become long-lived”.. “Oh that is interesting!” was the reaction most of the time. When I accidentally mentioned statistics or mathematics, the reaction was: “How can a woman like you, love mathematics?” Thus to disseminate our work in MIMOmics I needed to change my attitude. I needed to explain methods in the context of healthcare utilization. I had one previous attempt to make...

Read More

Slides of my Inaugural Lecture, Leeds, June 14th

“Statistics and Data (Analytics): You cannot have one without the other.” Inaugural Lecture to be given by Professor Jeanine Houwing-Duistermaat Presentation (slides) Abstract: Many fields of human interest are producing vast amounts of data: commerce, social science, biology, agriculture, healthcare, urban planning, transport, communications and many more in order to answer relevant questions.  For example in health research the datasets should provide insight into biological processes underlying health and disease and will be used to determine homogeneous sub-groups of patients to tailor treatment and screenings programs. Reaching these goals requires collaborations between experts in data acquisition, biology and methodology. The complexity of the data necessitates involvement of computer science, modelling, physics and statistics. Due to the availability of many different datasets in health (e.g. electronic health records, imaging, genomics, proteomics), research in statistical methods is nowadays very exciting. The challenge is to integrate these datasets for joint analysis while addressing measurement error, heterogeneity, missing values and sampling design. Outcome dependent sampling is typically employed to reduce costs without losing too much statistical efficiency. However such a design adds another layer of complexity to the statistical methodology, since most data analyses need to account for the design to ensure appropriate interpretation of the results. During the lecture I will share the current excitement in data analytics and illustrate statistical methods for integrating multiple datasets using the analysis of the multi-case...

Read More

Multidisciplinary research: yields better science?

We like traditions. Most traditions are good, but some are not. Most beliefs are correct, but some are wrong. However we do not like changes. Getting rid of traditions or accepting that our beliefs might have been wrong is difficult. Even in science, we prefer to work on new topics instead of revisiting old ones. However when different experts are joining researchers need to be willing to drop traditions and beliefs. I worked on a project which aimed to estimate the contributions of genetic and environmental factors to Rheumatoid Arthritis. We had access to a small dataset with twins. HLA was the first genetic factor which we considered. I did the calculations and got a value of 20%, which means that 20% of the total genetic variation could be explained by the HLA variants. SHOULD be wrong. It should be 30%. It always has been 30%. How could I come up with 20%? My first response: uncertainty in my estimate. Probably there is also uncertainty around the number of 30%. My 20% might be a little low and the old belief of 30% a little high just because of randomness. I did my computations over and over again and had several meetings before the value of 20% was accepted. Another example. Chemists are used to standardize glycomics data by dividing each abundance by the total value of all compounds...

Read More
  • 1
  • 2