Fake science: part I

Photoshopping, fraud and circular logic in research

‘It is simply no longer possible to believe much of the clinical research that is published, or to rely on the judgement of trusted physicians or authoritative medical guidelines. I take no pleasure in this conclusion, which I reached slowly and reluctantly over my two decades as an editor of the New England Journal of Medicine.’
— Marcia Angell

Check out this image from a peer reviewed research paper that supposedly shows skin lesions being treated by a laser:

Left: before treatment for keratoses. Right: after they were airbrushed out. (image diff is available here)

On being challenged the authors said:

“The photograph was taken in the same room with a similar environment; unfortunately the patient wore the same shirt”

The journal found this explanation convincing and forwarded the response to the complainants.

It’s becoming clear that science has major difficulties with not only intellectually unsupportable claims but also literally faked, entirely made up papers with random data, imaginary experiments and Photoshopped images.

Some of these papers are sold by organized gangs to Chinese doctors who need them to get promoted. But others come from really sketchy outfits like (sigh) the UK National Health Service, to whom we owe the masterpiece seen above.

The British government hasn’t noticed that its doctors are forging medical evidence (the primary author still works at NHS Poole). Instead this example comes from Elizabeth Bik, who runs a blog where she and a few other volunteers try to spot clusters of fraudulent papers. She embarrassed the journal in public here, and the paper was finally retracted. But she’s just a volunteer who raises money on Patreon for her work. Here’s her assessment of what’s going on:

“Science has a huge problem: 100s (1000s?) of science papers with obvious photoshops that have been reported, but that are all swept under the proverbial rug, with no action or only an author-friendly correction … There are dozens of examples where journals rather accept a clean (better photoshopped?) figure redo than asking the authors for a thorough explanation.”

Here are photos of supposedly different samples in which two images are identical. From “Anticancer activity of biogenerated silver nanoparticles: an integrated proteomic investigation”:

The journal investigated and concluded that this is fine.

As the only people trying to spot these are bloggers, we can safely assume that far larger numbers of papers are fake than the thousands they have already found and reported. For example,

0.04% of papers are retracted. At least 1.9% of papers have duplicate images “suggestive of deliberate manipulation”. About 2.5% of scientists admit to fraud, and they estimate that 10% of other scientists have committed fraud.

Yet the sad reality is that the size of the fraud problem is entirely unknown, because the institutions of science have absolutely no mechanisms to detect bad behavior whatsoever. Academia is dominated by (and largely originated) the same ideology calling for defunding the police, so no surprise that they just assume everyone has absolute integrity all the time. Research claims are constantly accepted at face value even when obviously nonsensical or fake. Deceptive research sails through peer review, gets published, cited and then incorporated into decision making. There are no rules and it’d be pointless to make any because there’s nobody to enforce them: universities are notorious for solidly defending fraudulent professors.

So let’s turn over the rock and see what crawls out. We’ll start with China and then turn our attention back to more western types of deception.

Chinese fraud studios

In 2018 the US National Science Foundation announced that “For the first time, China has overtaken the United States in terms of the total number of science publications”. Should the USA worry about this? Perhaps not. After some bloggers exposed an industrial research-faking operation that had generated at least 600 papers about experiments that never happened, a Chinese doctor reached out to beg for mercy:

“Hello teacher, yesterday you disclosed that there were some doctors having fraudulent pictures in their papers. This has raised attention. As one of these doctors, I kindly ask you to please leave us alone as soon as possible … Without papers, you don’t get promotion; without a promotion, you can hardly feed your family … You expose us but there are thousands of other people doing the same. As long as the system remains the same and the rules of the game remain the same, similar acts of faking data are for sure to go on. This time you exposed us, probably costing us our job. For the sake of Chinese doctors as a whole, especially for us young doctors, please be considerate. We really have no choice, please!”

Note the belief that “thousands of other people” are doing the same, and that these doctors need more than one paper to keep being promoted, so the 600 found so far is surely the tip of an iceberg given China’s size. In fact there are nearly 4 million licensed doctors in China, apparently all with the same promotion incentives. Scientific journals publish nearly 2 million articles per year, so there are quite probably tens of thousands, maybe hundreds of thousands of these papers in circulation.

The fake papers are remarkable:

They are so good they are undetectable in isolation. The NHS photo is an aberration; normally these papers get spotted by noticing re-used technical images across papers that claim to be different experiments by different people. The fake papers are probably produced by real scientists with access to real lab equipment. The use of bot-generated Gmail accounts for journal correspondance is also a signal, because Gmail is banned in China (e.g. [email protected], [email protected], [email protected]). The reliance on bulk generated Gmail accounts suggests that this one studio alone is working at staggering scale.
They are peer reviewed and published in western journals. For instance the “Journal of Cellular Biochemistry” by Wiley or “Biomedicine & Pharmacotherapy” by Elsevier. They claim to be doing advanced micro-biology on serious diseases: a typical title is something like “MicroRNA-125b promotes neurons cell apoptosis and Tau phosphorylation in Alzheimer’s disease”. Journals have no way to detect these papers and aren’t trying to develop any.
Some of them present traditional Chinese medicine as scientific. TCM is more or less the Chinese equivalent of homeopathy with lots of herbal remedies, eating body parts of exotic animals to cure erectile dysfunction and so on, but the Chinese government is obsessed with it and thinks it’s the same as normal medicine. From the top down Chinese scientists are expected to produce papers claiming that TCM works, and they do! Mostly this stuff stays in Chinese but the ever increasing reliance of western universities on Chinese funding means it’s now finding its way into the English language literature as well, e.g. “Probing the Qi of traditional Chinese herbal medicines by the biological synthesis of nano-Au” was published by the Royal Society of Chemistry.

Advert by a research faking operation. Credit to “Smut Clyde” and “TigerBB8”.

Most western scientists are too clever to buy a completely fake paper (or so we hope). But their promotion incentives are identical, and there are other techniques that let you publish as many fake papers as you want. Let’s turn our attention to …

Impossible numbers in western science

“The case against science is straightforward: much of the scientific literature, perhaps half, may simply be untrue.”
— Richard Horton, editor of the Lancet (source)

How many scientists just make up their data? A well known recent case of this was the Surgisphere scandal, in which a paper appeared in The Lancet that claimed to be based on a proprietary dataset of nearly 100,000 COVID-19 patients across over 670 US hospitals. This figure was larger than the official case counts of some entire continents at the time, so the claim was implausible on its face. Sure enough it turned out none of the authors had ever actually seen the data, just summaries of it provided by one guy, who on investigation had a long track record of dishonesty. The Lancet probably accepted this paper because it made Trump look bad and the editor (Horton, quoted above) seems to hate Trump much more than he hates bad science.

There are some other cases like this that came to light over the years, like Diederik Stapel, the story of Brian Wansink, and Paolo Macchiarini, who left a trail of dead patients in his wake. But whilst anecdotes about individual cases are interesting, can we be more rigorous?

One clue comes from automated tools that scan research papers looking for mathematically impossible numbers. In recent years a few such tools have been developed and deployed, mostly against psychology and food science.

The statcheck program showed that “half of all published psychology papers…contained at least one p-value that was inconsistent with its test”.
The GRIM program showed that of the papers it could verify, around half contained averages that weren’t possible given the sample sizes, and more than 20% contained multiple such inconsistencies.
The SPRITE program detected experiments on eating where the combined means and standard deviations implied things like a child had to have eaten 60 carrots in a single sitting, or that a volunteer chose to eat 3/4 kilogram of crisps.

Being flagged by a stats checker doesn’t guarantee the data is made up: GRIM can detect simple mistakes like typos and SPRITE requires common sense to detect that something is wrong (i.e. no child will eat a plate of 60 carrots). But when there are multiple such problems in a single paper, things start to look more suspicious. The fact that half of all papers had incorrect data in them is concerning, especially because it seems to match Richard Horton’s intuitive guess at how much science is simply untrue. And the GRIM paper revealed a deeper problem: more than half of the scientists refused to provide the raw data for further checking, even though they had agreed to share it as a condition for being published. This is rather suspicious.

One of the difficulties with detecting scientific fraud is that the line between fraud and simple absurdity can get quite blurry. Sometimes scientists “calculate” data that is clearly wrong, but don’t actually try to hide or it may even admit to it in the paper, knowing full well that nobody cares and nonsensical data won’t actually matter. Here’s an example from a COVID modelling paper:

The model was allowed to calculate that the average Brit must live with 7 other people, because it couldn’t obtain data fit otherwise (actual number=2.4). This one comes from University College London, is written by 12 neuroscientists, passed peer review and has 37 citations. The peer reviewer noticed that the incorrect number was in the paper but signed off on it anyway.

For decades psychiatrists published research into the “gene for depression” 5-HTTLPR. They created an entire literature not only linking the gene to depression but explaining how it worked, linking it to parenting styles, developing treatments based up on it. Over 450 papers were published on the topic. Eventually a geneticist discovered what they were doing and used DNA databanks to point out that none of those papers could possibly be true.

Sometimes numbers aren’t “wrong” but are instead logically vacuous. The bogus Flaxman et al paper from Imperial College that tried to prove lockdowns work had the usual problem of statistically implausible numbers, but more importantly was built on circular logic: they made a model that assumed only government interventions could end epidemics. This is obviously nonsense and they breezily admitted it in the paper, where they said their work was “illustrative only” and that “in reality even in the absence of government interventions we would expect Rt to decrease”. No problem: this fictional “illustration” got published in Nature and the authors presented the model’s outputs as scientific proof of their own assumption to the media. The paper is vacuous mathematical obfuscation, but scientists either can’t tell or don’t care: it has racked up over 1300 citations and the number is still growing rapidly. To put that number in perspective, in physics the top 1% of all researchers have around 2000 citations over their entire career.

What can be done?

600 fraudulent papers here, 450 over there, 1300+ citations of just one bad paper … pretty quickly it starts adding up.

We’re often told science is self-correcting. Is that true? Probably not. “The Science Reform Brain Drain” is perhaps the bleakest essay I’ve read this year. Reformers like the men who developed SPRITE and GRIM have been giving up and leaving science entirely. Pointing out in public that your colleagues are dishonest is never a great career move, and the work was often futile. One scientist who quit and went into industry summed up his fraud detection work like this:

The clearest consequence of my actions has been that Zhang has gotten better at publishing. Every time I reported an irregularity with his data, his next article would not feature that irregularity.

Even when a bull enters the China shop and gets a few papers retracted, it doesn’t actually matter because it has no effect: retracted papers keep getting cited for years afterwards and actually may be cited more than non-retracted papers, because one of the effects of retraction is that the article becomes free to download. Needless to say, there appear to be no personal consequences for scientists who write retracted papers.

In the past year most talk of bad science has been about models with bad assumptions. This is an issue but has been hiding problems that are far worse: scientists are buying fake papers, Photoshopping evidence, refusing to upload their data, knowingly publishing numbers that cannot be correct, citing papers that were retracted for being fraudulent and presenting mathematical obfuscations of what they want to be true as if it were science. Journals usually ignore fraud reports entirely, or when put under pressure let scientists submit “corrected” versions of their papers. Nothing can be done about any of this because above all, universities rely on reputation and don’t want anyone to find out about bad behavior, so they fight tooth and nail to protect academics no matter how corrupt they may be. There are no rules. Any rules that are alleged to exist turn out when tested to be illusions.

Conclusion

Claims made by scientists are automatically trusted by the majority of people. Maybe they shouldn’t be?

In part 2 (coming soon) we’ll look at some of the more subtle ways pseudo-scientists abuse modelling to create untrue beliefs in the population.

Fake science: part I

Fake science: part I

Photoshopping, fraud and circular logic in research

Chinese fraud studios

Impossible numbers in western science

What can be done?

Conclusion

Recommend

Python基础-1-开课介绍-李杰的在线视频教程

Python跳槽薪资报告：人生苦短，Python工程师们还好吗？

企业微信发的红包多久不收会返回？企业微信怎么发红包给好友？

python工程师工资一般多少-Python工程师的薪资到底有多高

Bitcoin Can Protect Your Portfolio From Inflation

一个程序员的正版清单

高并发中，那些不得不说的线程池与ThreadPoolExecutor类

SRE 的 7 个基本工具

How to present an alert

华友钴业今日一度涨近9%，有色金属ETF(512400)上涨1.20%

About Joyk