How does the biology community currently feel with regards to publishing descrip
ID: 31815 • Letter: H
Question
How does the biology community currently feel with regards to publishing descriptive and effect size statistics rather than significance stats? Almost every journal article I read in the cell biology field almost always reports things like P values and stats tests to report statistical significance, but should effect sizes be more important to a biologist? Do we even care if something is statistically significant if the effect size is negligible? Rather than crunch for significance, could one get away with showing things like confidence intervals, eta^2, Cohen's d, and r values instead over P values? P values tell you the odds that if you assume the null hypothesis is true, then the observation you're making are only 5%(assuming of course P<0.05). However, this can lead to the logical fallacy as noted by Aristotle--theory A predicts that changing X will cause Y. An experimenter thus performs experiments to manipulate X and sees changes in Y, therefore he/she concludes theory A is supported, which is however completely wrong. Theories B, C, D, E....... could all also predict that X changes Y and may even be better at it. Even if you conclude that your findings "support" theory A, it's still weak because you haven't ruled out all of the other possibilities.
So in order to avoid statistical significance relative to null hypothesis that has all sorts of pitfalls, can one just use descriptive and effect size statistics just as effectively, if not more so?
Explanation / Answer
I'm not a statistician, but I think the comments have got it right. There is never a reason to omit P values, statistical power or some other measure that you have done something that is not a random outcome.
For the sake of reference lets define the terms you reference: Eta squared is a ratio of the variances of two sets of measurements Cohen's d is a measure of the difference between two means.
R value or Pearson Correlation describes the linearity of two numbers, usually one of them being a measurement and the second being an experimental variable.
These numbers as you say are descriptive, but they could be created by throwing coins and writing them up. With small numbers of measurements, and a large enough range of possibilities, its possible to get terribly large numbers here.
Biology, medicine, social science and economics are really susceptible to this. You go into the field and measure butterfly wings or do surveys of people's opinions, or try to guess who is going to win the election and its quite expensive to do more measurements.
If you are measuring something hard to determine because accuracy is important such as a close election race or that is really complicated such as which genes convey a susceptibility to type 2 diabetes (a problem which remains unsolved because so many genes play a role) you need large numbers of responses. Yet each study that comes out gets some answer, but if you want to believe it these numbers should convince no one in most cases.
Microarray data and RNASeq data analysis for instance often suffer from this problem. The measurements are all statistically significant but each one costs hundreds or even thousands of dollars. Most experiments do a minimal three measurements to understand the variance in each measurement and then do 2 to 8 actual measurements. That's not going to so revealing when working with a system with thousands of genes in it. one bad sample with slightly different culture conditions can ruin the experiment.
Our butterfly biologist may measure 100 butterflies and stop when the P values are 0.05 or 0.001 - its a lot of work camping out and setting nets. The truth is that 5% is a number that can happen at random a lot. Even a 0.1% error will happen in one in ten such experiments. In thousands of experiments published that means that 10% of them have a mistake. Not so great.
It gets worse though - not only significance, but bias needs consideration. Because biologists and most other scientists don't understand statistics, the assumptions that we use when we calculate a P value are often inappropriate and don't give honest estimates of the chance that this is a random phenomenon.
If good looking result is chosen specifically to show to prove the point or some data are thrown out because they simply don't look good, or if a hypothesis is chosen in a biased way simply to fit an unreliable set of data.
Or the statistical assumptions of the calculation might simply be so inappropriate that to really cite them is an out and out lie. An everyday example of this is to do a BLAST search. The E-value calculations, if read as a P value would be wrong, even though they are mathematically correct - two strings that show a 10% identity will have a small E value - 10^-8 for instance, but this is only the chance two strings of these lengths will have so many letters in common. Anyone who plays with BLAST will quickly throw out anything that has less than 30% identity unless they are desperate, even though the E-values are infinitesimal.
John Ioannidis has made this subject his focus and has published widely on this topic. A good place to start is his commentary "Why Most Published Research Findings Are False".
There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.