We like traditions. Most traditions are good, but some are not. Most beliefs are correct, but some are wrong. However we do not like changes. Getting rid of traditions or accepting that our beliefs might have been wrong is difficult. Even in science, we prefer to work on new topics instead of revisiting old ones. However when different experts are joining researchers need to be willing to drop traditions and beliefs.
I worked on a project which aimed to estimate the contributions of genetic and environmental factors to Rheumatoid Arthritis. We had access to a small dataset with twins. HLA was the first genetic factor which we considered. I did the calculations and got a value of 20%, which means that 20% of the total genetic variation could be explained by the HLA variants.
SHOULD be wrong. It should be 30%. It always has been 30%. How could I come up with 20%?
My first response: uncertainty in my estimate. Probably there is also uncertainty around the number of 30%. My 20% might be a little low and the old belief of 30% a little high just because of randomness. I did my computations over and over again and had several meetings before the value of 20% was accepted.
Another example. Chemists are used to standardize glycomics data by dividing each abundance by the total value of all compounds in a sample. The reason is that the total abundance varies across samples. Hence the total abundance itself is not interpretable.
Is dividing by the total abundance a good idea? In some circumstances it might be the best option. In others another normalization procedure might be preferable. The issue with dividing by the total abundance is that it destroys the original correlation structure between the compounds. Hence the correlation structure between the compounds is not interpretable anymore.
Dividing by the total abundance leads to so called compositional data. Recently there was a huge discussion about the interpretation of this type of data among the discussion group of the American Statistical Association. Can statisticians convince chemists to use other normalizations? Not an easy task. A new normalization will generate other data with a different distribution and correlation structure. These new data might be easier to model and might yield better biological interpretations. But the data is not what it used to be. Abundances of compounds which were always negatively correlated might now be positively correlated.
SHOULD be wrong!
I was able to convince the rheumatologists in a few weeks. What about chemists? Convincing chemists is more challenging. The gap between statisticians/epidemiologists and chemists is large.
How to change this situation? The hope lies in promotion and support of multidisciplinary research. Everyone talks about it but is there any progress? I am afraid that it goes very slow. For example current PhD training programs are still too much focused on one discipline.
Within our multidisciplinary MIMOmics consortium we have made progress in bridging the gap between chemists and statisticians. Our industrial partner GENOS delivered a user friendly R package which can compute several normalizations of Glycomics. Chemists provide now untransformed data (although some of them still prefer to divide by the total) for statistical analysis. Thus depending on the research question we can apply the most appropriate normalization to the data! Will this results in better understanding of biology?
Glycans play an important role in many biological mechanisms. With access to the raw glycomics and with all the options in the R package, we will have better data for statistical analysis. Hence more results. This will increase our understanding of the onset of human diseases and of health outcomes!