Clinical Researchers Beware – ChatGPT is not a Reliable Aid

Photo by National Cancer Institute on Unsplash

Clinicians are all too familiar with the ‘Google patient’ who finds every scary, worst-case or outright false diagnosis online on whatever is ailing them. During COVID, misinformation spread like wildfire, eroding the public’s trust in vaccines and the healthcare profession. But now, AI models like ChatGPT can be whispering misleading information to the clinical researchers trying to produce real research.

Researchers from CHU Sainte-Justine and the Montreal Children’s Hospital recently posed 20 medical questions to ChatGPT. The chatbot provided answers of limited quality, including factual errors and fabricated references, show the results of the study published in Mayo Clinic Proceedings: Digital Health.

“These results are alarming, given that trust is a pillar of scientific communication. ChatGPT users should pay particular attention to the references provided before integrating them into medical manuscripts,” says Dr Jocelyn Gravel, lead author of the study and emergency physician at CHU Sainte-Justine.

Questionable quality, fabricated references

The researchers drew their questions from existing studies and asked ChatGPT to support its answers with references. They then asked the authors of the articles from which the questions were taken to rate the software’s answers on a scale from 0 to 100%.

Out of 20 authors, 17 agreed to review the answers of ChatGPT. They judged them to be of questionable quality (median score of 60%). They also found major (five) and minor (seven) factual errors. For example, the software suggested administering an anti-inflammatory drug by injection, when it should be swallowed. ChatGPT also overestimated the global burden of mortality associated with Shigella infections by a factor of ten.

Of the references provided, 69% were fabricated, yet looked real. Most of the false citations (95%) used the names of authors who had already published articles on a related subject, or came from recognised organisations such as the Food and Drug Administration. The references all bore a title related to the subject of the question and used the names of known journals or websites. Even some of the real references contained errors (eight out of 18).

ChatGPT explains

When asked about the accuracy of the references provided, ChatGPT gave varying answers. In one case, it claimed, “References are available in Pubmed,” and provided a web link. This link referred to other publications unrelated to the question. At another point, the software replied, “I strive to provide the most accurate and up-to-date information available to me, but errors or inaccuracies can occur.”

Despite even the most ‘truthful’ of these responses, ChatGPT poses hidden risks to academic, the researcher say.

“The importance of proper referencing in science is undeniable. The quality and breadth of the references provided in authentic studies demonstrate that the researchers have performed a complete literature review and are knowledgeable about the topic. This process enables the integration of findings in the context of previous work, a fundamental aspect of medical research advancement. Failing to provide references is one thing but creating fake references would be considered fraudulent for researchers,” says Dr Esli Osmanlliu, emergency physician at the Montreal Children’s Hospital and scientist with the Child Health and Human Development Program at the Research Institute of the McGill University Health Centre.

“Researchers using ChatGPT may be misled by false information because clear, seemingly coherent and stylistically appealing references can conceal poor content quality,” adds Dr Osmanlliu.

This is the first study to assess the quality and accuracy of references provided by ChatGPT, the researchers point out.

Source: McGill University Health Centre