Showing posts with label integrity. Show all posts
Showing posts with label integrity. Show all posts

Saturday, 22 February 2025

IEEE Has a Pseudoscience Problem

Guest post by Solal Pirelli


The IEEE, full name Institute of Electrical and Electronics Engineers, is one of the main scientific publishers in domains related to its name. Many IEEE venues, such as ICSE in software engineering and IROS in robotics, are “top” venues that publish important research. While these are conferences and not journals, computer science and related fields are unusual in that conferences are typically the more prestigious option.

But as I’ve covered before in the case of another big computer science publisher, world-class research can coexist with world-class nonsense. Many not-so-top IEEE venues publish “AI gobbledegook sandwiches”, pointless papers that apply standard machine learning or artificial intelligence to basic data sets resulting in vague predictions supposedly improving on ill-defined baselines.

Unfortunately, bad science published by IEEE isn’t limited to boring applications of boring algorithms to boring data. In this blog post, I’ll present IEEE-published pseudoscience of various kinds, show how this correlates with other problems, and discuss why publishers don’t do enough about it. 

All kinds of quackery 

The IEEE has published numerous new “methods” to help providers or users of pseudoscientific disciplines. Ayurveda is enhanced with a “preprocessing framework” to detect diabetes, a neural network to classify herbs, and even an AI assistant. Astrology is automated with a machine learning model. Myers-Briggs personality type testing is granted another neural network.

Some IEEE papers are at the very fringe of pseudoscience, unconventional even by quack standards. A symposium on antennas and propagation published three papers by the same author on “scientific traditional Chinese medicine”, a variant based on electromagnetism and 5G with “supernatural potential” (see here, here and here). An Indian conference on electronics published four papers (here, here, here and here) by the same first author on “electro-homeopathy”, the brainchild of a 19th century Italian count that an Indian high court called “nothing but quackery” a decade before these papers were published.

Of course, no list of pseudoscience would be complete without perpetual motion. That’s right, the IEEE has published two papers on perpetual motion in 2017 and 2022! How these were not desk-rejected is anyone’s guess.

Even work that is not pseudoscientific in itself can propagate harmful or downright absurd stereotypes. Consider what IEEE-published and supposedly peer-reviewed papers have to say about autism: 

  • “Children with autism require constant care because you never know what will trigger them” (source
  • "If symptoms of autism are detected early, children with autism usually return to normal development after effective medical intervention" (source
  • “A baby born with autism spectrum disorder may have a lower-than-average heart rate. Complete blockage of the heart at birth is rare. Abnormal heart rate leads to heart block. So, there is a high chance of the child's death due to permanent heart blockage at any time.” (source)

Why it matters

One may think that such papers won’t cause harm because they’re unlikely to be read, since they are mostly in unknown venues and unrelated to the IEEE’s domain. While I personally disagree since I believe publishing pseudoscience risks breaking the public’s trust in legitimate research, let me provide a more objective argument. Pseudoscience in papers is heavily correlated with other problematic practices that are more difficult to detect automatically. This makes searching for pseudoscience an effective way to find problematic venues, complementary to existing techniques.

The preprocessing framework to detect diabetes with Ayurveda? In a conference that accepts papers on the same day they are submitted, somehow speeding up the weeks or months usually necessary for proper peer review.

The neural network that classifies Ayurvedic herbs? In a conference that plagiarized its peer review policy from Elsevier’s “Transport Policy” journal. Look for fragments of this policy in your favorite search engine and you’ll find a surprising number of venues that have done so, seemingly without noticing the references to Transport Policy.

The four papers on electro-homeopathy? In a conference that published a mathematical “algorithm” amounting to high school mathematics. While exact definitions of “novelty” vary, no one could credibly claim that this paper is novel enough for a scientific conference.

The 2017 paper on perpetual motion? In a conference that didn’t notice an entirely plagiarized section in that paper, ironically from a source explaining why perpetual motion is impossible. How this is compatible with IEEE’s policy of checking all content for plagiarism is unclear.

The paper claiming “you never know what will trigger” autistic children? In a conference supposedly happening in a London office building, whose four IEEE-published editions only feature one paper from a European university among a sea of India-based authors. Did the authors of this conference’s papers really travel to the other side of the globe to present in a place not designed for presentations?

The neural network for Myers-Briggs? In a conference chaired by a professor whose Russian university is under sanctions from the US, the EU, Ukraine, and even Switzerland!

Action is rare 

The expected process here would be to report this nonsense to the publisher, who would investigate, quickly conclude these papers should never have been published, lose faith in the peer review process that led to their acceptance, and issue retractions. Barring extremely strong evidence from conference chairs that some cases were truly one-off exceptions, such retractions would cover entire editions of conferences.

This happens… sometimes. The IEEE has retracted papers before, such as this one after “only” five months. They have also retracted entire venues, such as this one totaling 400 papers, four years after it was reported.
 
But the IEEE frequently does not react at all to reports. Guillaume Cabanac, who specializes in scientific fraud detection, has repeatedly and publicly called them out. For instance, he’s reported telltale signs of ChatGPT as in this paper that includes “Regenerate Response” in the middle of text and this paper that includes “I am unable to […] due to the fact I am an AI language model”. He’s also reported “tortured phrases”, attempts at avoiding plagiarism detection that instead create nonsense such as “parcel misfortune” instead of “packet loss” in computer networking, in sometimes large concentrations. Cabanac and other sleuths have published “proceedings-level reports” on PubPeer, such as this one, when entire IEEE conferences have problems. None of the examples in this paragraph have led to any public reaction from the IEEE.

The IEEE occasionally issues “expressions of concern”, such one as for this paper over a year after concrete evidence of plagiarism was publicly reported. But expressions of concerns are not retractions. In mid-2023, Retraction Watch noted that hundreds of IEEE papers reported by Guillaume Cabanac and Harvard lecturer Kendra Albert were still up for sale. A year and a half later, that remains the case.

One case noted above is particularly noteworthy in terms of both reputation and IEEE awareness: The “scientific TCM” papers were published in the 2022 and 2023 editions of the “International Symposium on Antennas and Propagation”, a 6-decade-old conference whose 2024 edition boasted the IEEE President as a keynote speaker. Clearly, the IEEE is aware of the venue and its papers. What’s the point in “reporting” them?


Processes are inadequate 

The scale of publishers’ actions is nowhere near the scale of the problem. Creating a new conference or journal does not require that much time if the peer reviewing process is fake. As long as the average time it takes a publisher to retract a venue is higher than the time it takes to create a new venue, there won’t be meaningful progress.

Current publisher processes are designed to correct honest mistakes, not to fight malice. The time it takes to contact authors, wait for their response, wait for them to find original data, and so on is worth it when a single paper has a problem that can be explained by human error. But any such process is a waste of time when a paper contains blatant pseudoscience, has obviously been plagiarized, or uses terminology so bizarre no reviewer could have understood it.

To give an example of scale, here’s a collision of pseudoscience and tortured phrases. The paper on an AI assistant for ayurveda mentioned earlier is in the “2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)”. Guillaume Cabanac’s Problematic Paper Screener currently lists 185 cases of tortured phrases manually confirmed by Cabanac himself, with another 160 pending assessment. These include “herbal language” instead of natural language, “system getting to know” instead of machine learning, “give-up-to-give-up” instead of end-to-end, and “0.33-celebration” instead of third-party.  

Individually contacting and waiting for hundreds of authors just in case they can explain why their paper talks about 0.33-celebrations isn’t going to cut it. Neither is individually contacting and waiting for dozens of conference editors just in case they can explain why their peer review process didn’t spot this nonsense. 

What can we do?

Given the incentives and processes at play, it’s not surprising to see the IEEE or any other big publisher publish pseudoscience. The authors of the papers mentioned in this post probably didn’t do anything illegal, except maybe for occasional plagiarism of copyrighted content, but nobody has the time and money to sue for such boring violations. This gives publishers a double excuse: they’re not publishing anything illegal, and retractions without a solid legal basis could backfire.

The scientific community needs to ban the “incompetence” defense from authors and stop associating with publishers that can’t be bothered to act quickly enough.  

Authors who publish obvious nonsense should not get a chance to explain themselves or “correct” their paper. 

Publishers make enough money from processing and selling articles. They can defend themselves from occasional lawsuits by angry authors, and they can hire scientific integrity specialists.

When I say “scientific integrity specialist”, that can unfortunately be as simple as “person looking for specific keywords in Google Scholar”. It’s what I did to find pseudoscience, and you can do that too. Report these on PubPeer, directly to publishers, or both. You can also go to the Problematic Paper Screener’s page listing articles that have not been manually assessed yet, and follow the instructions.

Finally, remember that most scientists have no idea this is going on. You can help by publicly calling out problematic papers and lack of action. Ask candidates for governance boards in more democratic publishers like the IEEE what they plan to do about fraud. Discourage institutions, especially public ones in democratic countries, from making blanket deals with publishers.

Thursday, 22 August 2024

Optimizing research integrity investigations: the need for evidence

 

An article was published last week by Caron et al (2024) entitled "The PubPeer conundrum: Administrative challenges in research misconduct proceedings". The authors present a perspective on research misconduct from a viewpoint that is not often heard: three of them are attorneys who advise higher education institutions on research misconduct matters, and the other has served as a Research Integrity Officer at a hospital. 

The authors conclude that the bar for research integrity investigations should be raised, requiring a complaint to reach a higher evidential standard in order to progress, and using a statute of limitations to provide a cutoff date beyond which older research would not usually be investigated. This amounts to saying that the current system is expensive and has bad consequences, so let's change it to do fewer investigations - this will cost less and fewer bad consequences will happen.  The tldr; version of this blogpost is that the argument fails because on the one hand the authors give no indication of the frequency of bad consequences, and on the other hand, they ignore the consequences of failing to act.

How we handle misconduct allegations can be seen as an optimization problem; to solve it, we need two things: data on frequency of different outcomes, and an evaluation of how serious different outcomes are.

We can draw an analogy with a serious medical condition that leads to a variety of symptoms, and which can only be unambiguously diagnosed by an invasive procedure which is both unpleasant and expensive. In such a case, the family doctor will base the decision whether to refer for invasive testing on the basis of information such as physical symptoms or blood test results, and refer the patient for specialist investigations only if the symptoms exceed some kind of threshold. 

The invasive procedure may confirm that the disease is really present, a true positive, or that it is absent, a false positive. Those whose symptoms do not meet a cutoff do not progress to the invasive procedure, but may nevertheless have the disease, i.e., false negatives, or they may be free from the disease, true negatives. The more lenient the cutoff, the more true positives, but the price we pay will be to increase the rate of false positives. Conversely, with a stringent cutoff, we will reduce false positives, but will also miss true cases (i.e. have more false negatives).

Optimization is not just a case of seeking to maximize correct diagnoses - it must also take into account costs and benefits of each outcome. For some common conditions, it is deemed more serious to miss a true case of disease (false negative) than to send someone for additional testing unnecessarily (false positive). Many people feel they would put up with inconvenience, embarrassment, or pain rather than miss a fatal tumour. But some well-established medical screening programmes have been queried or even abandoned on the grounds that they may do more harm than good by creating unnecessary worry or leading to unwarranted medical interventions in people who would be fine left untreated. 

So, how does this analogy relate to research misconduct? The paper by Caron et al emphasizes the two-stage nature of the procedure that is codified in the US by the Office of Science and Technology Policy (OSTP), which is mandatory for federal agencies that conduct or support research. When an allegation of research misconduct is presented to a research institution, it is rather like a patient presenting themselves to a physician: symptoms of misconduct are described, and the research integrity officers must decide whether to proceed to a full investigation - a procedure which is both costly and stressful.

Just as patients will present with symptoms that are benign or trivial, some allegations of misconduct can readily be discounted. They may concern minor matters or be obviously motivated by malice. But there comes a point when the allegations can't be dismissed without a deeper investigation - equivalent to referring the patient for specialist testing. The complaint of Caron et al is that the bar for starting an investigation is specified by the regulator, and is set too low, leading to a great deal of unnecessary investigation. They make it sound rather like the situation that arose with prostate screening in the UK: use of a rather unreliable blood test led to a situation where there was overdiagnosis and overtreatment: in other words, the false positive rate was far too high. The screening programme was eventually abandoned.

My difficulty with this argument is that at no point do Caron et al indicate what the false positive rate is for investigations of misconduct. They emphasize that the current procedures for investigation of misconduct are onerous, both on the institution and on the person under investigation. They note the considerable damage that can be done when a case proves to be a false positive, where an aura of untrustworthiness may hang around the accused, even if they are exonerated. Their conclusion is that the criteria for undertaking an investigation should be made more stringent. This would undoubtedly reduce the rate of false positives, but it would also decrease the true positive detection rate.

One rather puzzling aspect of Caron et al's paper was their focus on the post-publication peer review website PubPeer as the main source of allegations of research misconduct. The impression they gave is that PubPeer has opened the floodgates to accusation of misconduct, many of which have little substance, but which the institutions are forced to respond to because of ORI regulations. This is the opposite of what most research sleuths experience, which is that it is extremely difficult to get institutions to take reports of possible research misconduct seriously, even when the evidence looks strong.

Given these diametrically opposed perspectives, what is needed is hard data on how many reported cases of misconduct proceed to a full investigation, and how many subsequently are found to be justified. And, given the authors' focus on PubPeer, it would be good to see those numbers for allegations that are based on PubPeer comments versus other sources.

There's no doubt that the volume of commenting on PubPeer has increased, but the picture presented by Caron et al seems misleading in implying that most complaints involve concerns such as "a single instance of image duplication in a published paper". Most sleuths who regularly report on PubPeer know that such a single instance is unlikely to be taken seriously; they also know that a researcher who commits research misconduct is often a serial offender, with a pattern of problems across multiple papers. Caron et al note the difficulties that arise when concerns are raised about papers that were published many years ago, where it is unlikely that original data still exist. That is a valid point, but I'd be surprised if research integrity officers receive many allegations via PubPeer based solely on a single paper from years ago; the reason that older papers come to attention is typically because a researcher's more recent work has come into question, which triggers a sleuth to look at other cases. I accept I could be wrong, though. I tend to focus on cases where there is little doubt that misconduct has occurred, and, like many sleuths, I find it frustrating when concerns are not taken seriously, so maybe I underestimate the volume of frivolous or unfounded allegations. If Caron et al want to win me over, they'd have to provide hard data showing how much investigative time is spent on cases that end up being dismissed.

Second, and much harder to estimate, what is the false negative rate: how often are cases of misconduct missed? The authors focus on the sad plight of the falsely accused researcher but say nothing about the negative consequences when a researcher gets away with misconduct. 

Here, the medical analogy may be extended further, because in one important respect, misconduct is less like cancer and more like an infectious disease. It affects all who work with the researcher, particularly younger researchers who will be trained to turn a blind eye to inconvenient data and to "play the game" rather than doing good research. The rot spreads even further: huge amounts of research funding are wasted by others trying to build on noncredible research, and research syntheses are corrupted by the inclusion of unreliable or even fictitious findings. In some high-stakes fields, medical practice or government policy may be influenced by fraudulent work. If we simply make it harder to investigate allegations of misconduct, we run the risk of polluting academic research. And the research community at large can develop a sense of cynicism when they see fraudsters promoted and given grants while honest researchers are neglected.

So, we have to deal with the problem that, currently, fraud pays. Indeed, it is so unlikely to be detected that, for someone with a desire to succeed uncoupled from ethical scruples, it is a more sensible strategy to make up data than to collect it. Research integrity officers may worry now that they are confronted with more accusations of misconduct than they can handle, but if institutions focus on raising the bar for misconduct investigations, rather than putting resources in to tackle the problem, it will only get worse.

In the UK, universities sign up to a Concordat to Support Research Integrity which requires them to report on the number and outcome of research misconduct investigations every year. When it was first introduced, the sense was that institutions wanted to minimize the number of cases reported, as it might be a source of shame.  Now there is growing recognition that fraud is widespread, and the shame lies in failing to demonstrate a robust and efficient approach to tackling it. 


Reference

Caron, M. M., Lye, C. T., Bierer, B. E., & Barnes, M. (2024). The PubPeer conundrum: Administrative challenges in research misconduct proceedings. Accountability in Research, 1–19. https://doi.org/10.1080/08989621.2024.2390007.

Sunday, 26 May 2024

Are commitments to open data policies worth the paper they are written on?

 

As Betteridge's law of headlines states: "Any headline that ends in a question mark can be answered by the word no."  So you know where I am going with this.  

 

I'm a longstanding fan of open data - in fact, I first blogged about this back in 2015. So I've been gratified to see the needle shift on this, in the sense that over the past decade, in a rush to present themselves as good guys, various institutions and publishers have published policies supporting open data. The problem is that when you actually ask them to implement those policies, they back down.   

 

I discussed arguments for and against data-sharing in a Commentary article in 2016. I divided the issues according to whether they focused on the impact of data-sharing on researchers or on research participants. Table 1 from that article, entitled "Conflict between interests of researchers and advancement of science" is reproduced here:

 

Argument

Counter-argument

1. Lack of time to curate data.

Unless adequately curated, data will over time become unusable, including by the original researcher.

2. Personal investment—reluctance to give data to freeloaders.

Reuse of data increases its value and the researcher benefits from additional citations. There is also an ethical case for maximizing use of data obtained via public funding.

3. Concerns about being scooped before the analysis is complete.

This is a common concern though there are few attested cases. A time-limited period of privileged use by the study team can be specified to avoid scooping.

4. Fear of errors being found in the data.

Culture change is needed to recognize errors are inevitable in any large dataset and should not be a reason for reputational damage. Data-sharing allows errors to be found and corrected.

 

I then went on to discuss two other concerns which focused on implications of data-sharing for human participants, viz:

5.  Ethical concerns about confidentiality of personal data, especially in the context of clinical research

6.  Possibility that others with a different agenda may misuse the data, e.g. perform selective analyses that misrepresent the findings.

 

These last two issues raise complex concerns and there's plenty to discuss on how address them, but I'll put that to one side for now, as the case I want to comment on concerns a simple dataset where there is limited scope for secondary analyses and where no human participants are involved.

 

My interest was piqued by comments on PubPeer about a paper entitled "Magnetic field screening in hydrogen-rich high-temperature superconductors ".  The thread on PubPeer starts with this extraordinary comment by J. E. Hirsch:

 

I requested the underlying data for Figs. 3a, 3e, 3b, 3f of this paper on Jan 11, 2023. This is because the published data for Figs. 3a and 3e, as well as for Figs. 3b and 3f, are nominally the same but incompatible with each other, and I would like to understand why that is. I asked the authors to explain, but they did not provide an explanation. Neither did they supply the data. The journal told me that it had received the data from the authors but will not share them with me because they are "confidential". I requested that the journal posts an Editor Note informing readers that data are unavailable to readers. The journal responded that because data were share with editors they "cannot write an editorial note on the published article stating the data is unavailable as this would be factually incorrect".

 

Pseudonymous commenter Orchestes quercus drew attention to the Data Availability statement in the article: "The data that support the findings of this study are available from the corresponding authors upon reasonable request".

 

J. E. Hirsch then added a further comment: 

 

The underlying data are still not available, the editor says the author deems the request "unreasonable" but it cannot divulge the reasoning behind it, nor can the journal publish an editor note that there are restrictions on data availability because the data were provided to the journal.  Springer Nature's Research Integrity Director wrote to me in September 2023 that "we recognize the right of the authors to not share the data with you, in line with the authors’ chosen data availability statement", and that "As Springer Nature considers the correspondence with the authors confidential, we cannot share with you any further details.

 

Now, I know nothing whatsoever about superconductors or J. E. Hirsch, but I think the editors, publisher and the authors are making themselves look very silly, and indeed suspicious, by refusing to share the data.  They can't plead patient confidentiality or ethical restrictions - it seems they are just refusing to comply because they don't want to.  

 

To up the ante, Orchestes quercus extracted data from the figures and did further analyses, which confirmed that J. E. Hirsch had a point - the data did not appear to be internally consistent.

 

Meanwhile, I had joined the PubPeer thread, pointing out

 

The authors and editor appear to be in breach of the policy of Nature Portfolio journals, stated here:
https://www.nature.com/nature-portfolio/editorial-policies/reporting-standards, viz:

An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. A condition of publication in a Nature Portfolio journal is that authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications. Any restrictions on the availability of materials or information must be disclosed to the editors at the time of submission. Any restrictions must also be disclosed in the submitted manuscript.

After publication, readers who encounter refusal by the authors to comply with these policies should contact the chief editor of the journal. In cases where editors are unable to resolve a complaint, the journal may refer the matter to the authors' funding institution and/or publish a formal statement of correction, attached online to the publication, stating that readers have been unable to obtain necessary materials to replicate the findings.

I also noted that two of the authors are based at a Max Planck Institute. The Max Planck Gesellschaft is a signatory to the BerlinDeclaration on Open Access to Knowledge in the Sciences and Humanities.  On the website it states:

the Max Planck Society (MPG) is committed to the goal of providing free and open access to all publications and data from scholarly research (my emphasis).

 

Well, the redoubtable J. E. Hirsch had already thought of that, and in a subsequent PubPeer comment made public various exchanges he had had with luminaries from the Max Planck Institutes.

 

All I can say to the Max Planck Gesellschaft is that this is not a good look. Hirsch has noted an inconsistency in the published figures.  This has been confirmed by another reader and needs to be explained. The longer people dig in defensively, attacking the person making the request rather than just showing the raw data, the more it looks as if something fishy is going on here.

 

Why am I so hung up on data-sharing? The reason is simple. The more I share my own data, or use data shared by others, the more I appreciate the value of doing so. Errors are ubiquitous, even when researchers are careful, but we'll never know about them if data are locked away.

 

Furthermore, it is a sad reality that fraudulent papers are on the rise, and open data is one way of defending against them. It's not a perfect defence: people can invent raw data as well as summary data, but realistic data are not so easy to fake, and requiring open data would slow down the fraudsters and make them easier to catch.

 

Having said that, asking for data is not tantamount to accusing researchers of fraud: it should be accepted as normal scientific practice to make data available in order that others can check the reproducibility of findings. If someone treats such a request as an accusation, or deems it "unreasonable", then I'm afraid it just makes me suspicious.  

 

And if organisations like Springer Nature and Max Planck Gesellschaft won't back up their policies with action, then I think they should delete them from their websites. They are presenting themselves as champions of open, reproducible science, while acting as defenders of non-transparent, secret practices. As we say in the UK, fine words butter no parsnips.   

 

 P.S. 27th May:  A comprehensive account of the superconductivity affair has just appeared on the website For Better Science.  This suggests things are even worse than I thought.   

 

In addition, you can see Jorge Hirsch explain his arduous journey in attempting to access the data here.    

 

NOTE ON COMMENTS: Many thanks to those who have commented. Comments are moderated to prevent spam, so there is a delay before they appear, but I will accept on-topic comments in due course.




Friday, 9 February 2024

The world of Poor Things at MDPI journals


At the weekend, the Observer ran a piece by Robin McKie entitled "‘The situation has become appalling’: fake scientific papers push research credibility to crisis point". I was one of those interviewed for the article, describing my concerns about a flood of dodgy papers that was polluting the scientific literature.

Two days later I received an email from the editorial office of MDPI publishers with the header "[Children] (IF: 2.4, ISSN 2227-9067): Good Paper Sharing on the Topic of" (sic) that began:

Greetings from the Children Editorial Office!

We recently collected 10 highly cited papers in our journal related to Childhood Autism. And we sincerely invite you to visit and read these papers, because you are an excellent expert in this field of study.

Who could resist such a flattering invitation? MDPI is one of those publishers that appears to be encouraging publication of low quality work, with a massive growth in special issues where papers are published with remarkably rapid turnaround times. Only last week it was revealed that the journal is affected by fake peer review that appears to be generated by AI. So I was curious to take a look.

The first article, by Frolli et al (2022a) was weird. It reported a comparison of two types of intervention designed to improve emotion recognition in children with autism, one of which used virtual reality. The first red flag was the sample size: two groups each of 30 children, all originally from the city of Caserta. I checked Wikipedia, which told me the population of Caserta was around 76,000 in 2017. Recruiting participants for intervention studies is typically slow and laborious and this is a remarkable sample size to recruit from such a small region. But credibility is then stretched to breaking point on hearing that the selection criteria required that the children were all aged between 9 and 10 years and had IQs of 97 or above. No researcher in their right mind would impose unnecessary constraints on recruitment, and both the age and IQ criteria are far tighter than would usually be adopted. I wondered whether there might be a typo in this account, but we then hear that the IQ range of the sample is indeed remarkably narrow: 

"The first experimental group (Gr1) was composed of 30 individuals with a mean age of 9.3 (SD 0.63) and a mean IQ of 103.00 (SD 1.70). ...... The second experimental group (Gr2) was composed of 30 individuals with a mean age of 9.4 (SD 0.49) and mean IQ of 103.13 (SD 2.04)...."

Most samples for studies using Wechsler IQ scales have SD of at least 8, even if cutoffs are applied as selection criteria, so this is unbelievably low.

This dubious paper prompted me to look at others by the first author. It was rather like pulling a thread on a hole in a sweater - things started to unravel fast. A paper published by Frolli et al (2023a) in the MDPI journal Behavioral Sciences claimed to have studied eighty 18-year-olds recruited from four different high schools. The selection criteria were again unbelievably stringent: IQ assessed on the WAIS-IV fell between 95-105 "to ensure that participants fell within the average range of intellectual functioning, minimizing the impact of extreme cognitive variations on our analyses". The lower IQ range selected here corresponds to z-score of -0.33 or 37th percentile. If the population of students covered the full range of IQ, then only around 25% would meet the criterion (between 37th and 63rd centile), so to obtain a sample of 80 it would be necessary to test over 300 potential participants. Furthermore, there are IQ screening tests that can be used in this circumstance that are relatively quick to administer, but the WAIS-IV is not one of them. We are told all participants were given the full test, which requires individual administration by a qualified psychologist and takes around one hour to complete. So who did all this testing, and where? The article states: "The data were collected and analyzed at the FINDS Neuropsychiatry Outpatient Clinic by licensed psychologists in collaboration with the University of International Studies of Rome (UNINT)." So we are supposed to believe that hundreds of 18-year-olds trekked to a neuropsychiatry outpatient clinic for a full IQ screening which most of them would not have passed. I cannot imagine a less efficient way of conducting such a study. I could not find any mention of compensation for participants, which is perhaps unsurprising as the research received no external funding. All of this is described as happening remarkably fast, with ethics approval in January 2023, and submission of the article in October 2023.

Another paper in Children in 2023 focused on ADHD, and again reported recruiting two groups of 30 children for an intervention that lasted 5 months (Frolli et al., 2023b). The narrow IQ selection criteria were again used, with WISC-IV IQs in the range 95-105, and the mean IQs were 96.48 (SD =1.09) and 98.44 (SD = 1.12) for groups 1 and 2 respectively. Again, the research received no external funding. The report of ethics approval is scanty "The study was conducted in accordance with the Declaration of Helsinki. The study was approved by the Ethics Committee and the Academic Senate of the University of International Studies of Rome."

The same first author published a paper on the impact of COVID-19 on cognitive development and executive functioning in adolescents in 2021 (Frolli et al, 2021). I have not gone over it in detail, but a quick scan revealed some very odd statistical reporting. There were numerous F-ratios, but they were all negative, which is impossible, as F is a ratio between two positive numbers. Furthermore, the reported p-values and degrees of freedom didn't always correspond to the F-ratio, even if the sign was ignored.

At this point I was running out of steam, but a quick look at Frolli et al (2022a) on Executive Functions and Foreign Language Learning suggested yet more problems, with the sentence "Significance at the level of 5% (α < 0.001) has been accepted" featuring at least twice. It is hard to believe that a human being wrote this sentence, or that any human author, editor or reviewer read it without comment.

If anyone is interested in pulling at other related threads, I suspect it would be of interest to look at articles accepted for a Special Issue of the MDPI journal Disabilities co-edited by Frolli.

In his brilliant film Poor Things, Yorgos Lanthimos distorts familiar objects and places just enough to be disturbing. Lisbon looks like what I imagine Lisbon would be in the Victorian age, except that the colours are unusually vivid, there are strange flying cars in the sky, and nobody seems concerned at the central character wandering around only partially clothed (see, e.g., this review).  The combined impression is that MDPI publishes papers from that universe, where everything looks superficially like genuine science but with jarring features that tell you something is amiss. The difference is that Poor Things has a happy ending.

References 

Frolli, A.; Ricci, M.C.; Di Carmine, F.; Lombardi, A.; Bosco, A.; Saviano, E.; Franzese, L. The Impact of COVID-19 on Cognitive Development and Executive Functioning in Adolescents: A First Exploratory Investigation. Brain Sci. 2021, 11, 1222. https://doi.org/10.3390/brainsci11091222

Frolli, A.; Savarese, G.; Di Carmine, F.; Bosco, A.; Saviano, E.; Rega, A.; Carotenuto, M.; Ricci, M.C. Children on the Autism Spectrum and the Use of Virtual Reality for Supporting Social Skills. Children 2022a, 9, 181. https://doi.org/10.3390/children9020181

Frolli, A.; Cerciello, F.; Esposito, C.; Ciotola, S.; De Candia, G.; Ricci, M.C.; Russo, M.G. Executive Functions and Foreign Language Learning. Pediatr. Rep. 2022b, 14, 450-456. https://doi.org/10.3390/pediatric14040053

Frolli, A.; Cerciello, F.; Ciotola, S.; Ricci, M.C.; Esposito, C.; Sica, L.S. Narrative Approach and Mentalization. Behav. Sci. 2023a, 13, 994. https://doi.org/10.3390/bs13120994

Frolli, A.; Cerciello, F.; Esposito, C.; Ricci, M.C.; Laccone, R.P.; Bisogni, F. Universal Design for Learning for Children with ADHD. Children 2023b, 10, 1350. https://doi.org/10.3390/children10081350

Sunday, 19 November 2023

Defence against the dark arts: a proposal for a new MSc course

 


Since I retired, an increasing amount of my time has been taken up with investigating scientific fraud. In recent months, I've become convinced of two things: first, fraud is a far more serious problem than most scientists recognise, and second, we cannot continue to leave the task of tackling it to volunteer sleuths. 

If you ask a typical scientist about fraud, they will usually tell you it is extremely rare, and that it would be a mistake to damage confidence in science because of the activities of a few unprincipled individuals. Asked to name fraudsters they may, depending on their age and discipline, mention Paolo Macchiarini, John Darsee, Elizabeth Holmes or Diederik Stapel, all high profile, successful individuals, who were brought down when unambiguous evidence of fraud was uncovered. Fraud has been around for years, as documented in an excellent book by Horace Judson (2004), and yet, we are reassured, science is self-correcting, and has prospered despite the activities of the occasional "bad apple". The problem with this argument is that, on the one hand, we only know about the fraudsters who get caught, and on the other hand, science is not prospering particularly well - numerous published papers produce results that fail to replicate and major discoveries are few and far between (Harris, 2017). We are swamped with scientific publications, but it is increasingly hard to distinguish the signal from the noise. In my view, it is getting to the point where in many fields it is impossible to build a cumulative science, because we lack a solid foundation of trustworthy findings. And it's getting worse and worse.

My gloomy prognosis is partly engendered by a consideration of a very different kind of fraud: the academic paper mill. In contrast to the lone fraudulent scientist who fakes data to achieve career advancement, the paper mill is an industrial-scale operation, where vast numbers of fraudulent papers are generated, and placed in peer-reviewed journals with authorship slots being sold to willing customers. This process is facilitated in some cases by publishers who encourage special issues, which are then taken over by "guest editors" who work for a paper mill. Some paper mill products are very hard to detect: they may be created from a convincing template with just a few details altered to make the article original. Others are incoherent nonsense, with spectacularly strange prose emerging when "tortured phrases" are inserted to evade plagiarism detectors.

You may wonder whether it matters if a proportion of the published literature is nonsense: surely any credible scientist will just ignore such material? Unfortunately, it's not so simple. First, it is likely that the paper mill products that are detected are just the tip of the iceberg - a clever fraudster will modify their methods to evade detection. Second, many fields of science attempt to synthesise findings using big data approaches, automatically combing the literature for studies with specific keywords and then creating databases, e.g. of genotypes and phenotypes. If these contain a large proportion of fictional findings, then attempts to use these databases to generate new knowledge will be frustrated. Similarly, in clinical areas, there is growing concern that systematic reviews that are supposed to synthesise evidence to get at the truth instead lead to confusion because a high proportion of studies are fraudulent. A third and more indirect negative consequence of the explosion in published fraud is that those who have committed fraud can rise to positions of influence and eminence on the back of their misdeeds. They may become editors, with the power to publish further fraudulent papers in return for money, and if promoted to professorships they will train a whole new generation of fraudsters, while being careful to sideline any honest young scientists who want to do things properly. I fear in some institutions this has already happened.

To date, the response of the scientific establishment has been wholly inadequate. There is little attempt to proactively check for fraud: science is still regarded as a gentlemanly pursuit where we should assume everyone has honourable intentions. Even when evidence of misconduct is strong, it can take months or years for a paper to be retracted. As whistleblower Raphaël Levy asked on his blog: Is it somebody else's problem to correct the scientific literature? There is dawning awareness that our methods for hiring and promotion might encourage misconduct, but getting institutions to change is a very slow business, not least because those in positions of power succeeded in the current system, and so think it must be optimal.

The task of unmasking fraud is largely left to hobbyists and volunteers, a self-styled army of "data sleuths", who are mostly motivated by anger at seeing science corrupted and the bad guys getting away with it. They have developed expertise in spotting certain kinds of fraud, such as image manipulation and improbable patterns in data, and they have also uncovered webs of bad actors who have infiltrated many corners of science. One might imagine that the scientific establishment would be grateful that someone is doing this work, but the usual response to a sleuth who finds evidence of malpractice is to ignore them, brush the evidence under the carpet, or accuse them of vexatious behaviour. Publishers and academic institutions are both at fault in this regard.

If I'm right, this relaxed attitude to the fraud epidemic is a disaster-in-waiting. There are a number of things that need to be done urgently. One is to change research culture so that rewards go to those whose work is characterised by openness and integrity, rather than those who get large grants and flashy publications. Another is for publishers to act far more promptly to investigate complaints of malpractice and issue retractions where appropriate. Both of these things are beginning to happen, slowly. But there is a third measure that I think should be taken as soon as possible, and that is to train a generation of researchers in fraud busting. We owe a huge debt of gratitude to the data sleuths, but the scale of the problem is such that we need the equivalent of a police force rather than a volunteer band. Here are some of the topics that an MSc course could cover:

  • How to spot dodgy datasets
  • How to spot manipulated figures
  • Textual characteristics of fraudulent articles
  • Checking scientific credentials
  • Checking publisher credentials/identifying predatory publishers
  • How to raise a complaint when fraud is suspected
  • How to protect yourself from legal attacks
  • Cognitive processes that lead individuals to commit fraud
  • Institutional practices that create perverse incentives
  • The other side of the coin: "Merchants of doubt" whose goal is to discredit science

I'm sure there's much more that could be added and would be glad of suggestions. 

Now, of course, the question is what could you do with such a qualification. If my predictions are right, then individuals with such expertise will increasingly be in demand in academic institutions and publishing houses, to help ensure the integrity of work they produce and publish. I also hope that there will be growing recognition of the need for more formal structures to be set up to investigate scientific fraud and take action when it is discovered: graduates of such a course would be exactly the kind of employees needed in such an organisation.

It might be argued that this is a hopeless endeavour. In Harry Potter and the Half-Blood Prince (Rowling, 2005) Professor Snape tells his pupils:

 "The Dark Arts, are many, varied, ever-changing, and eternal. Fighting them is like fighting a many-headed monster, which, each time a neck is severed, sprouts a head even fiercer and cleverer than before. You are fighting that which is unfixed, mutating, indestructible."

This is a pretty accurate description of what is involved in tackling scientific fraud. But Snape does not therefore conclude that action is pointless. On the contrary, he says: 

"Your defences must therefore be as flexible and inventive as the arts you seek to undo."

I would argue that any university that wants to be ahead of the field in this enterprise could should flexibility and inventiveness in starting up a postgraduate course to train the next generation of fraud-busting wizards. 

Bibliography

Bishop, D. V. M. (2023). Red flags for papermills need to go beyond the level of individual articles: A case study of Hindawi special issues. https://osf.io/preprints/psyarxiv/6mbgv
Boughton, S. L., Wilkinson, J., & Bero, L. (2021). When beauty is but skin deep: Dealing with problematic studies in systematic reviews | Cochrane Library. Cochrane Database of Systematic Reviews, 5. Retrieved 4 June 2021, from https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.ED000152/full
 Byrne, J. A., & Christopher, J. (2020). Digital magic, or the dark arts of the 21st century—How can journals and peer reviewers detect manuscripts and publications from paper mills? FEBS Letters, 594(4), 583–589. https://doi.org/10.1002/1873-3468.13747
Cabanac, G., Labbé, C., & Magazinov, A. (2021). Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals (arXiv:2107.06751). arXiv. https://doi.org/10.48550/arXiv.2107.06751
Carreyrou, J. (2019). Bad Blood: Secrets and Lies in a Silicon Valley Startup. Pan Macmillan.
COPE & STM. (2022). Paper mills: Research report from COPE & STM. Committee on Publication Ethics and STM. https://doi.org/10.24318/jtbG8IHL 
Culliton, B. J. (1983). Coping with fraud: The Darsee Case. Science (New York, N.Y.), 220(4592), 31–35. https://doi.org/10.1126/science.6828878 
Grey, S., & Bolland, M. (2022, August 18). Guest Post—Who Cares About Publication Integrity? The Scholarly Kitchen. https://scholarlykitchen.sspnet.org/2022/08/18/guest-post-who-cares-about-publication-integrity/ 
Hanson, M., Gómez Barreiro, P., Crosetto, P., & Brockington, D. (2023). The strain on scientific publishing (2309; p. 33343265 Bytes). arXiv. https://arxiv.org/ftp/arxiv/papers/2309/2309.15884.pdf 
Harris, R. (2017). Rigor Mortis: How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions (1st edition). Basic Books.

Judson, H. F. (2004). The Great Betrayal. Orlando.

Lévy, R. (2022, December 15). Is it somebody else’s problem to correct the scientific literature? Rapha-z-Lab. https://raphazlab.wordpress.com/2022/12/15/is-it-somebody-elses-problem-to-correct-the-scientific-literature/
 Moher, D., Bouter, L., Kleinert, S., Glasziou, P., Sham, M. H., Barbour, V., Coriat, A.-M., Foeger, N., & Dirnagl, U. (2020). The Hong Kong Principles for assessing researchers: Fostering research integrity. PLOS Biology, 18(7), e3000737. https://doi.org/10.1371/journal.pbio.3000737
 Oreskes, N., & Conway, E. M. (2010). Merchants of Doubt: How a handful of scientists obscured the truth on issues from tobacco smoke to global warming. Bloomsbury Press.
 Paterlini, M. (2023). Paolo Macchiarini: Disgraced surgeon is sentenced to 30 months in prison. BMJ, 381, p1442. https://doi.org/10.1136/bmj.p1442  
Rowling, J. K. (2005) Harry Potter and the Half-Blood Prince. Bloomsbury, London. ‎ ISBN: 9780747581086
Smith, R. (2021, July 5). Time to assume that health research is fraudulent until proven otherwise? The BMJ. https://blogs.bmj.com/bmj/2021/07/05/time-to-assume-that-health-research-is-fraudulent-until-proved-otherwise/
Stapel, D. (2016). Faking science: A true story of academic fraud.  Translated by Nicholas J. Brown. http:// nick.brown.free.fr/stapel.
Stroebe, W., Postmes, T., & Spears, R. (2012). Scientific misconduct and the myth of self-correction in science. Perspectives on Psychological Science, 7(6), 670–688. https://doi.org/10.1177/1745691612460687
 

Note: On-topic comments are welcome but are moderated to avoid spam, so there may be a delay before they appear.

Monday, 4 September 2023

Polyunsaturated fatty acids and children's cognition: p-hacking and the canonisation of false facts

One of my favourite articles is a piece by Nissen et al (2016) called "Publication bias and the canonization of false facts". In it, the authors model how false information can masquerade as overwhelming evidence, if, over cycles of experimentation, positive results are more likely to be published than null ones. But their article is not just about publication bias: they go on to show how p-hacking magnifies this effect, because it leads to a false positive rate that is much higher than the nominal rate (typically .05).

I was reminded of this when looking at some literature on polyunsaturated fatty acids and children's cognition. This was a topic I'd had a passing interest in years ago when fish oil was being promoted for children with dyslexia and ADHD. I reviewed the literature back in 2008 for a talk at the British Dyslexia Association (slides here). What was striking then was that, whilst there were studies claiming positive effects of dietary supplements, they all obtained different findings. It looked suspicious to me, as if authors would keep looking in their data, and divide it up every way possible, in order to find something positive to report – in other words, p-hacking seemed rife in this field.

My interest in this area was piqued more recently simply because I was looking at articles that had been flagged up because they contained "tortured phrases". These are verbal expressions that seem to have been selected to avoid plagiarism detectors: they are often unintentionally humorous, because attempts to generate synonyms misfire. For instance, in this article by Khalid et al, published in Taylor and Francis' International Journal of Food Properties we are told: 

"Parkinson’s infection is a typical neurodegenerative sickness. The mix of hereditary and natural variables might be significant in delivering unusual protein inside explicit neuronal gatherings, prompting cell brokenness and later demise" 

And, regarding autism: 

"Chemical imbalance range problem is a term used to portray various beginning stage social correspondence issues and tedious sensorimotor practices identified with a solid hereditary part and different reasons."

The paper was interesting, though, for another reason. It contained a table summarising results from ten randomized controlled trials of polyunsaturated fatty acid supplementation in pregnant women and young children. This was not a systematic review, and it was unclear how the studies had been selected. As I documented on PubPeer,  there were errors in the descriptions of some of the studies, and the interpretation was superficial. But as I checked over the studies, I was also struck by the fact that all studies concluded with a claim of a positive finding, even when the planned analyses gave null results. But, as with the studies I'd looked at in 2008, no two studies found the same thing. All the indicators were that this field is characterised by a mixture of p-hacking and hype, which creates the impression that the benefits of dietary supplementation are well-established, when a more dispassionate look at the evidence suggests considerable scepticism is warranted.

There were three questionable research practices that were prominent. First, testing a large number of 'primary research outcomes' without any correction for multiple comparisons. Three of the papers cited by Khalid did this, and they are marked in Table 1 below with "hmm" in the main analysis column. Two of them argued against using a method such as Bonferroni correction:

"Owing to the exploratory nature of this study, we did not wish to exclude any important relationships by using stringent correction factors for multiple analyses, and we recognised the potential for a type 1 error." (Dunstan et al, 2008)

"Although multiple comparisons are inevitable in studies of this nature, the statistical corrections that are often employed to address this (e.g. Bonferroni correction) infer that multiple relationships (even if consistent and significant) detract from each other, and deal with this by adjustments that abolish any findings without extremely significant levels (P values). However, it has been validly argued that where there are consistent, repeated, coherent and biologically plausible patterns, the results ‘reinforce’ rather than detract from each other (even if P values are significant but not very large)" (Meldrum et al, 2012)
While it is correct that Bonferroni correction is overconservative with correlated outcome measures, there are other methods for protecting the analysis from inflated type I error that should be applied in such cases (Bishop, 2023).

The second practice is conducting subgroup analyses: the initial analysis finds nothing, so a way is found to divide up the sample to find a subgroup that does show the effect. There is a nice paper by Peto that explains the dangers of doing this. The third practice, looking for correlations between variables rather than main effects of intervention: with sufficient variables, it is always possible to find something 'significant' if you don't employ any correction for multiple comparisons. This inflation of false positives by correlational analysis is a well-recognised problem in the field of neuroscience (e.g. Vul et al., 2008).

Given that such practices were normative in my own field of psychology for many years, I suspect that those who adopt them here are unaware of how serious a risk they run of finding spurious positive results. For instance, if you compare two groups on ten unrelated outcome measures, then the probability that something will give you a 'significant' p-value below .05 is not 5% but 40%. (The probability that none of the 10 results is significant is .95^10, which is .6. So the probability that at least one is below .05 is 1-.6 = .4). Dividing a sample into subgroups in the hope of finding something 'significant' is another way to multiply the rate of false positive findings. 

In many fields, p-hacking is virtually impossible to detect because authors will selectively report their 'significant' findings, so the true false positive rate can't be estimated. In randomised controlled trials, the situation is a bit better, provided the study has been registered on a trial registry – this is now standard practice, precisely because it's recognised as an important way to avoid, or at least increase detection of, analytic flexibility and outcome switching. Accordingly, I catalogued, for the 10 studies reviewed by Khalid et al, how many found a significant effect of intervention on their planned, primary outcome measure, and how many focused on other results. The results are depressing. Flexible analyses are universal. Some authors emphasised the provisional nature of findings from exploratory analyses, but many did not. And my suspicion is that, even if the authors add a word of caution, those citing the work will ignore it.  


Table 1: Reporting outcomes for 10 studies cited by Khalid et al (2022)

Khalid # Register N Main result* Subgrp Correlatn Abs -ve Abs +ve
41 yes 86 NS yes no no yes
42 no 72 hmm no no no yes
43 no 420 hmm no no yes yes
44 yes 90 NS no yes yes yes
45 no 90 yes no yes NA
yes
46 yes 150 hmm no no yes yes
47 yes 175 NS no yes yes yes
48 no 107 NS yes no yes yes
49 yes 1094 NS yes no yes yes
50 no 27 yes no no yes yes

Key: Main result coded as NS (nonsignificant), yes (significant) or hmm (not significant if Bonferroni corrected); Subgrp and Correlatn coded yes or no depending on whether post hoc subgroup or correlational analyses conducted. Abs -ve coded yes if negative results reported in abstract, no if not, and NA if no negative results obtained. Abs +ve coded yes if positive results mentioned in abstract.

I don't know if the Khalid et al review will have any effect – it is so evidently flawed that I hope it will be retracted. But the problems it reveals are not just a feature of the odd rogue review: there is a systemic problem with this area of science, whereby the desire to find positive results, coupled with questionable research practices and publication bias, have led to the construction of a huge edifice of evidence based on extremely shaky foundations. The resulting waste in researcher time and funding that comes from pursuing phantom findings is a scandal that can only be addressed by researchers prioritising rigour, honesty and scholarship over fast and flashy science.

Sunday, 23 July 2023

Is Hindawi “well-positioned for revitalization?”


Guest post by Huanzi Zhang*

 Screenshot from https://www.wiley.com/en-us/network/publishing/research-publishing/open-access/wiley-acquires-hindawi-qa-with-liz-ferguson
 

Over the past year, special issues of dozens of Hindawi journals have been exposed as being systematically manipulated, resulting in the delisting of more than 20 Hindawi journals from major journal databases, as well as the retraction of more than 2,700 papers by the publisher. This "unexpected event" at Hindawi also led to a slump in profits for the parent company, John Wiley & Sons. However, in a recent statement, the president, CEO & director of Wiley, Brian Napack, stated that Hindawi was now ready for revitalization and reinstatement of the special issue program. In my opinion, Wiley has not dealt adequately with the integrity issues that led to the problem, but appears focused on growth  through the medium of special issues. This raises questions as to whether Hindawi’s operation is sustainable in the long term. 

Napack’s statement can be read here. He stated:

 “Fiscal '24 (starts on May 1, 2023) will be a year of revitalization for Hindawi with positive signs already emerging. We've now named a new leader of Hindawi, a talented Wiley veteran with deep expertise in the area. We've restarted the special issues program and we will be ramping it up throughout the year. We're working through the large article backlog, and we are executing our journal growth plans.”

One might, however, be forgiven for being a bit sceptical about this upbeat message, since just a year before, Napack said:

 “Hindawi performed at a very high level this year, delivering strong double-digit organic revenue growth and 36% article output growth on a pro forma basis, and has achieved this with exceptional margins. We have now completed the integration and we are benefiting significantly from Hindawi's industry leading open publishing practices and its highly efficient systems.” 

So what is the problem with Hindawi journals that suddenly got Wiley into trouble? Is Hindawi really well-positioned for revitalization? 

Wiley’s acquisition of Hindawi

Wiley announced the acquisition of Hindawi on January 5, 2021. The person who pushed Wiley to buy Hindawi was Judy Verses, Wiley's Executive Vice President, who left Wiley to join Elsevier a few months after the acquisition was completed. On January 11, 2021, in an interview with Verses and Wiley's Senior Vice President Liz Ferguson, we learned that Wiley expected that Hindawi journals would publish many articles by Chinese authors, expanding their market in China:

 “It has surpassed the US in recent years, and in an increasing number of disciplines is undoubtedly the global leader. Hindawi had the foresight to launch and develop journals that reflect strengths in the China research space. Similar to Wiley, Hindawi identified early on how important it was to serve the needs of China-based researchers. Bringing together our two teams now means we have an even stronger position to be able to work to the needs of those researchers.”

The Publishing Perspective's interview with Jay Flynn, Senior Vice President and Chief Product Offer of Wiley Research, published on January 5, 2021, and Wiley's internal interview with Ferguson, published on February 23, 2021, mentioned the importance of China. The new niche market for Hindawi journals seemed to respond soundly to this appointment, and the number of submissions began to increase significantly (Figure 1), while no such increase had been seen in the preceding months.  In September 2021, Jay Flynn was promoted to Executive Vice President of Wiley Research to replace Judy Verses.

 

Figure 1. Number of submissions received per month for 24 journals that subsequently had papers retracted by Hindawi. Data from Hindawi Journal Report.

The papermill problem 

The first concerns were raised by research integrity communities and individuals, who since 2021 posted thousands of comments on the PubPeer journal club relating to papers published in hundreds of special issues of Hindawi journals. Many comments were made by anonymous sleuths Rhipidura albiventris, Hoya camphorifolia and Parashorea tomentella, whose contributions sometimes exceeded 100 per day.  Problems were not limited to individual special issues or even individual journals. There appeared to be systematic manipulation of the publishing process, especially affecting special issues, indicating activities of so-called “paper mills” – fraudulent organisations that will sell authorship and/or citations of papers, often faked, for a fee. Leonid Schneider, who runs the blog For Better Science, assisted David Bimler and others in posting their findings from Hindawi journals. Nick Wise discussed “What is going on in Hindawi special issues?” on October 12, 2022. These sleuths noted that many supposedly ‘peer-reviewed’ manuscripts had incoherent or unintelligible content, and corresponding authors used email addresses from other institutions; furthermore, a large number of papers cited irrelevant references, presumably to boost citation counts; some paper mills used the article processing charge (APC) waiver policy of Hindawi to make more profit. The pattern of abnormal citations confirmed that the fraud was not bounded by journals, and so publication-based investigations made it difficult to expose specific paper mills. 

It soon became clear that the special issues that were such a lucrative source of income for Hindawi were wide open to corrupt "guest editors" who, once appointed, could use fake peer review and flood the journal with fraudulent papers and irrelevant citations. In the gold open access model, Wiley earns APCs for every article published in Hindawi journals, whether in a special issue or not, so their incentive to exert quality control is compromised. Some examples are so extreme that nobody could take them seriously, such as an article on the ideological and political education of the Chinese Communist Party in a special issue "Exploration of Human Cognition using Artificial Intelligence in Healthcare", which was submitted, peer-reviewed and received on the same day. There are thousands of other papers which may be genuine but whose subject falls well outside the scope of the special issue where they are published, indicating that the journal is out of editorial control. 

Many of the authors and guest editors of the problematic papers mentioned by Bimler came from Asia. Ruihang Huang, Chunjiong Zhang, and Hanlie Cheng, PhD students at Donghua University, Tongji University and China University of Geosciences, Beijing, respectively, were beneficiaries of the citation manipulation and participants in the manipulation of special issues. Another example is Kaifa Zhao, who approved many nonsense manuscripts for publication as a guest editor at two journals: Computational Intelligence and Neuroscience and Journal of Environmental and Public Health. TigerBB8 identified Zhao as a PhD student at the Hong Kong Polytechnic University, and Dorothy Bishop requested an investigation by the university. As reported in Retraction Watch, their report claimed that Zhao's identity had been stolen by Yizhang Jiang, Zhao's master's program advisor. 

"According to Mr Zhao, he was not aware of relevant emails from Hindawi and has never responded to emails that are related to the two special issues"

 Hindawi takes action (slowly)

After a year of rapid growth in the Chinese market (Figure 2), Wiley acted. On September 28, 2022, Ferguson announced that 511 papers would be retracted from Hindawi journals. Intrestingly, no mention was made of the comments on PubPeer and Bimler's blog post; instead it was stated that these retractions were based on the findings of the Hindawi Research Integrity Team. The first retractions were seen in mid-November with concentrated releases during the Lunar New Year.

 

Figure 2. Number of articles and reviews published in 14 Hindawi journals 2019-2022. Data from Scopus.

It is possible that mass retractions were delayed because Wiley did not want to disrupt their agenda at the 74th Frankfurter Buchmesse (October 19 to 23, 2022). There was no indication that Wiley shared Hindawi's problems at the book fair. Instead, they were busy with other things. Flynn announced the creation of Wiley Partners Solution to meet the "scholarly publishing needs at scale" on October 17. Ferguson participated in a forum entitled "How the Article-Based Economy is Transforming Research Publishing" on October 19. Intriguingly, an essay posted by Chemistry World on April 24, 2023, citing Flynn, noted that Wiley had convened a meeting of publishers at that book fair, invited Clarivate, the owner of major journal database Web of Science (WoS) Core Collection, and disclosed to them the problems with Hindawi journals. We do not know the outcome of this meeting, but change did occur. On the one hand, Hindawi began issuing retractions on November 16, 2022. Nevertheless, in October 2022, special issues of Hindawi journals were still being published with many compromised articles, though from December 2022 onwards, the number of papers published in special issues decreased.  

Delisting of Hindawi journals

The public information prompted journal databases to re-evaluate whether Hindawi journals should continue to be indexed. In February 2023, DOAJ (Directory of Open Access Journals) delisted thirteen Hindawi journals. Then Scopus discontinued the indexing of six Hindawi journals. On March 20, 2023, Clarivate delisted nineteen Hindawi journals from WoS Core Collection. The fact that Education Research International was delisted suggested Clarivate conducted an independent investigation, as this journal had not been mentioned in relevant sources.  

As shown in Figure 3, the actions of the publisher and journal databases did not always involve the same journals. 


Figure 3. Twenty-six problematic Hindawi journals. WoS Core Collection: The journal was delisted from WoS Core Collection in March 2023. DOAJ: The journal was delisted from DOAJ in February 2023. Scopus: The journal was delisted from Scopus in the first half of 2023. Ferguson 500+: Papers in the journal were retracted and Liz Ferguson's statement was cited in the retraction statement. Flynn 2200+: Papers in the journal were retracted by Wiley using similar statements after May 2023. 

Clarivate did not publish the reasons for the delisting of each journal, nor did they delist more Hindawi journals before the release of 2023 Journal Citation Reports on June 28, 2023. Compared to MDPI, whose mega-journal the International Journal of Environmental Research and Public Health was delisted, Wiley's public response was subdued. In a mildly worded statement on March 22, 2023, on their WeChat Official Account, Hindawi said they were "disappointed" that their journals were delisted by Clarivate but did not offer any defence. In another post on April 5, 2023, Hindawi stated they would not appeal the delisting and suggested that the authors submit their manuscripts to Wiley journals. A guest post by Flynn in the Scholarly Kitchen on April 4, 2023, mentioned that:

 “At Wiley we take full responsibility for the quality of the content we publish across our portfolio.” 

He also announced a further 1,200 retractions to be issued by Hindawi journals. Flynn disclosed to Chemistry World how they selected which publications to retract, specifically that he deployed 200 people from his editorial staff to conduct "a manual review of every single paper that we thought may have been compromised'".

Impact on authors

Many authors published in Hindawi journals because they had the cachet of being listed in scholarly databases. One author of an article published in February 2023 in Oxidative Medicine and Cellular Longevity distributed email templates she drafted to others via an instant messaging software, encouraging them to ask Hindawi to work with Clarivate to index papers with publication dates before March 19, 2023. Anonymous sources described the chaos of Hindawi's customer service in late March 2023. Many people complained that Hindawi never responded to their emails. On the other hand, one author received a response from Oxidative Medicine and Cellular Longevity ([email protected]), even though his complaint was about an article in another journal. Some authors of accepted manuscripts complained that Hindawi delayed their requests to withdraw their submissions, and feared that manuscripts would be accidentally published. 

Other authors turned on the sleuths who had exposed paper mill activity on PubPeer, describing their activities as "social media-related PubPeer extortion". Jincheng Wang from the Nanjing Drum Tower Hospital, who had published in a compromised special issue, suggested that the intent of those who posted comments was to blackmail the authors, under the threat of translating publicly available comments into Chinese and then posting them on social media in China. He encouraged authors to report these comments to the moderators.

Impact of delisting on Hindawi’s business 

Wiley did not inform investors about the retractions in Hindawi journals in their 2nd quarter report (August 1 to October 31, 2022) published on December 7, 2022. In the 3rd quarter report (November 1, 2022, to January 31, 2023) on March 9, 2023, Napack confronted the issue head-on: 

“Upon discovery, the Wiley team responded quickly, suspending the Hindawi special issues program and fixing the source of the problem by purging the external bad actors and by implementing measures to prevent this from happening again. To date, these actions include increasing editorial controls and introducing new AI-based screening tools into the editorial process. We've also been scrubbing the archive and publicly retracting any compromised articles.”

 And, 

“We put the fixes in place. We feel very good about what we've done. We are reopening the programs. And we are moving forward to clear the backlog and drive forward with our publishing program.” 

However, the statistics on publications showed that publications in special issues continued through November 2022 to January 2023. Despite the claim that the problem had been resolved, more than 200 articles published in special issues of 34 Hindawi journals in 2023 received comments on PubPeer relating to concerns about the publishing process. The release of retraction statements was also delayed. As of July 20, 2023,  retraction statements were issued on six dates, on May 24, June 21, June 28, June 29, July 12, and July 19, with 112, 559, 514, 1, 521, and 510 retractions, respectively. The total number of retracted papers, approximately 2700 including the initial 500 from 2022, substantially exceeds the 1200 mentioned by Flynn

An interesting development has been the involvement of law firms who specialize in shareholder rights litigation, such as Rosen Law Firm, Kirby McInerney LLP, Schall Law Firm, Glancy Prongay & Murray LLP, the Law Offices of Frank R. Cruz and the Law Offices of Howard G. Smith. All of these firms recently advertised that they are investigating whether Wiley issued misleading information about Hindawi to the investing public. 

In the 4th quarterly report for 2023, Napack stated that they had remedied all the problems, and were ready for “revitalization” 

“As discussed in Q3, we suspended the fast-growing special issues program after identifying a research integrity issue. This issue was the result of external misconduct by non-Wiley editors and reviewers. Essentially, Wiley decided to take a short-term hit to preserve the integrity of our journals and the value of our highly respected Wiley brand. This industry-wide issue has been widely reported on, and we believe that we now have it fully remediated in Wiley.” 

In the same report, Napack was still expressing satisfaction with Hindawi's performance since the acquisition: 

“Our expectation for Hindawi was a couple-fold. One, it would accelerate our position in that market, which it has; and that it would provide significant growth, which it has, and it will provide the ability to provide significant cascade across our portfolio that we could find homes for the many hundreds of thousands of articles that we get every year that are not published. Our expectations are the same going forward. We expect that over the next 12 to 18 months, we will be fully ramping back up. So, by '25, we're back on course with our volume growth and it should drop to the bottom line at/or we're close to the margin -- very healthy margin that it always has in our -- across all of our Open Access, but certainly across the Hindawi asset. So, the -- relative to our initial expectations, this acquisition has outperformed if you just can look aside for a minute against a very short-term thing that happened to us. But we're going to lead our way -- lead the industry out of it, and we feel very, very good about the future of our overall Open Access program.” 

So the paper mill debacle, which led to thousands of fraudulent papers being published in Hindawi journals over at least two years, is described as “a very short-term thing that happened to us” and is now “fully remediated”, was blamed on “external misconduct by non-Wiley editors and reviewers”. This leaves hanging the question of how those non-Wiley editors and reviewers not only achieved powerful positions determining what was published in Hindawi journals, but also continued to do so long after attention had been drawn to the problem by sleuths.

Regaining the market? 

Wiley is a mighty, major international publisher, and they have the potential to achieve a Hindawi revitalization, if revitalization is defined as a significant rebound in the number of papers published in Hindawi journals. The question is, which niche market does Hindawi intend to regain? Are they well-positioned to do so? Do they have any appreciation of the tension between their goal of publishing as much as possible, and the reputational costs of publishing papers that are low-quality at best and fraudulent at worst?

The sleuth Parashorea tomentella described the evolution of the niche of special issues of Hindawi journals after its acquisition in China. There is a highly competitive "first-tier" niche of authors from research universities and institutions, who have many manuscripts. China also has an extended, "second-tier" niche of authors from community colleges, polytechnics, and non-teaching hospitals. There are many, many of these potential customers, but they lack manuscripts on the one hand, and desire publications for promotion on the other. Paper mills fill the need of authors in this niche to publish and the needs of publishers to make money. It would be difficult for Hindawi to regain the first-tier niche, because most authors and institutions care about the reputation of the publisher, and even if they aren’t concerned about integrity they are spooked by unprecedented large-scale retractions. 

Hindawi is, however, in a strong position to regain the second-tier niche. First, despite all the problems, they still have many journals that are recognized by the Chinese authorities (typically those listed in databases such as WoS Core Collection and Engineering Village). Wiley has close links with those who maintain databases and may have an advantage in avoiding their journals being blacklisted. Wiley's longstanding commitment and partnership with Chinese research institutes and government stakeholders has brought them particularly close to the National Science Library, Chinese Academy of Sciences (NSL/CAS), a bureaucracy that compiles a blacklist which is recognized by many Chinese institutions. On June 16, 2023, Wiley and NSL/CAS announced the establishment of a Joint Laboratory on Scientific and Technical Journal Innovation. The press release mentioned that an important topic for the joint lab is research integrity, and Liying Yang, director of journal evaluation in NSL/CAS, introduced what NSL/CAS can do, including updating their Early Warning Journal List, which aimed to target paper mills. 

For reasons unknown, NSL/CAS has been particularly kind to Hindawi journals. NSL/CAS released their controversial Early Warning Journal List in December 2020, December 2021 and January 2023. In the latest version, the Hindawi journals Biomed Research International, Complexity, Advances in Civil Engineering, Shock and Vibration, Scientific Programming and Journal of Mathematics, which had been on the blacklist, were reinstated. Other problematic Hindawi journals were never on the blacklist. In contrast, the blacklist compiled by another Chinese bureaucracy, the Institute of Scientific and Technical Information of China (ISTIC), does not show undue goodwill toward Hindawi journals. In January and February 2023, some Chinese institutions, such as Zhejiang Gongshang University and Anhui Provincial Hospital (The First Affiliated Hospital of University of Science and Technology of China), told their employees not to submit to Hindawi journals. NSL/CAS was sending a different signal from other Chinese institutions and encouraged authors to continue submitting to Hindawi journals, although this effect was offset by Clarivate's delisting of nineteen Hindawi journals three months later. 

Hindawi’s determination to retain the second-tier niche may explain why they have continued to publish their customers' manuscripts from known paper mills. For instance, an article (now retracted) was published on March 11, 2023, long after the guest editor Kaifa Zhao had been proven to be an impostor. To take another example, the Hindawi Research Integrity Team retracted nine articles from a special issue of BioMed Research International on “Minimally Invasive Treatment Protocols in Clinical Dentistry” between November 22, 2022, and February 14, 2023, but subsequently published new articles in the same special issue on topics out of scope.

 

Figure 6. A special issue (https://www.hindawi.com/journals/bmri/si/652179/ ) of BioMed Research International published articles on a topic out of scope in between retraction statements
 

Even more alarmingly, special issues of four journals (Computational and Mathematical Methods in Medicine, Journal of Healthcare Engineering, Journal of Environmental and Public Health, and Computational Intelligence and Neuroscience) continued to publish questionable articles after Hindawi announced that the journals had closed on May 2, 2023, see e.g., https://pubpeer.com/publications/1F1263D5537A0EF96588929A60D15B ). In one weird case, an article that had been accepted 604 days previously was published in Journal of Healthcare Engineering on July 7, 2023. Perhaps this manuscript had been blocked by production processes or the Hindawi Research Integrity Team, but it was eventually published after the journal was closed. 

Reasons for pessimism 

The reason I am pessimistic is that so far Wiley's proposals to improve the publishing process for Hindawi journals have focused on the use of AI-based screening tools. Wiley has not committed to hiring more editors for Hindawi journals. As the number of submissions increases, the situation will only get worse if peer-review processes hosted by guest editors are assigned to overworked in-house editors to oversee. 

Wiley still has a chance to fix things. The first thing they should do is stop manuscripts received by external bad actors from continuing to be published. The second is to issue more retractions. I'm glad to see that in June 2023, Catriona MacCallum, the director of Open Science of Hindawi shared their approach to scaling up retractions, including focusing on manipulation of the process rather than author wrongdoing. Publisher retractions are painful for the publisher, journals, and authors, but they are necessary, and there are not enough of them. The third is to investigate the internal bad actors in an open and transparent manner. I would also encourage them to recruit more in-house editors, release the identities of the external bad actors they have found, and document the details of how internal controls failed so lessons can be learned. 

Most of us value our reputation for its own sake. As Shakespeare said in Othello: 

“Good name in man and woman, dear my lord, 

Is the immediate jewel of their souls: 

Who steals my purse steals trash; ’tis something, nothing; ’twas mine, ’tis his, and has been slave to thousands; 

But he that filches from me my good name 

Robs me of that which not enriches him, 

And makes me poor indeed” 

Indeed, as the Hindawi story shows, for a commercial organization, reputation is not just a desirable feel-good factor – it has huge financial implications. If an academic publisher like Wiley becomes known for boosting their profits by publishing screeds of arrant nonsense, their bottom line will ultimately suffer. Reputable researchers will not want their name associated with a publisher who behaves this way. If Wiley are not willing to control fraud because it is the right thing to do, they should at least recognize the importance of integrity for retaining the confidence of the academic institutions on whom they depend. 

 

Footnote 

* The author declares that there is no potential conflict of interest. The author uses a pseudonym because he/she lives in an authoritarian state and fears facing unpredictable political reprisals.