Tag: ChatGPT

Clinical Researchers Beware – ChatGPT is not a Reliable Aid

Photo by National Cancer Institute on Unsplash

Clinicians are all too familiar with the ‘Google patient’ who finds every scary, worst-case or outright false diagnosis online on whatever is ailing them. During COVID, misinformation spread like wildfire, eroding the public’s trust in vaccines and the healthcare profession. But now, AI models like ChatGPT can be whispering misleading information to the clinical researchers trying to produce real research.

Researchers from CHU Sainte-Justine and the Montreal Children’s Hospital recently posed 20 medical questions to ChatGPT. The chatbot provided answers of limited quality, including factual errors and fabricated references, show the results of the study published in Mayo Clinic Proceedings: Digital Health.

“These results are alarming, given that trust is a pillar of scientific communication. ChatGPT users should pay particular attention to the references provided before integrating them into medical manuscripts,” says Dr Jocelyn Gravel, lead author of the study and emergency physician at CHU Sainte-Justine.

Questionable quality, fabricated references

The researchers drew their questions from existing studies and asked ChatGPT to support its answers with references. They then asked the authors of the articles from which the questions were taken to rate the software’s answers on a scale from 0 to 100%.

Out of 20 authors, 17 agreed to review the answers of ChatGPT. They judged them to be of questionable quality (median score of 60%). They also found major (five) and minor (seven) factual errors. For example, the software suggested administering an anti-inflammatory drug by injection, when it should be swallowed. ChatGPT also overestimated the global burden of mortality associated with Shigella infections by a factor of ten.

Of the references provided, 69% were fabricated, yet looked real. Most of the false citations (95%) used the names of authors who had already published articles on a related subject, or came from recognised organisations such as the Food and Drug Administration. The references all bore a title related to the subject of the question and used the names of known journals or websites. Even some of the real references contained errors (eight out of 18).

ChatGPT explains

When asked about the accuracy of the references provided, ChatGPT gave varying answers. In one case, it claimed, “References are available in Pubmed,” and provided a web link. This link referred to other publications unrelated to the question. At another point, the software replied, “I strive to provide the most accurate and up-to-date information available to me, but errors or inaccuracies can occur.”

Despite even the most ‘truthful’ of these responses, ChatGPT poses hidden risks to academic, the researcher say.

“The importance of proper referencing in science is undeniable. The quality and breadth of the references provided in authentic studies demonstrate that the researchers have performed a complete literature review and are knowledgeable about the topic. This process enables the integration of findings in the context of previous work, a fundamental aspect of medical research advancement. Failing to provide references is one thing but creating fake references would be considered fraudulent for researchers,” says Dr Esli Osmanlliu, emergency physician at the Montreal Children’s Hospital and scientist with the Child Health and Human Development Program at the Research Institute of the McGill University Health Centre.

“Researchers using ChatGPT may be misled by false information because clear, seemingly coherent and stylistically appealing references can conceal poor content quality,” adds Dr Osmanlliu.

This is the first study to assess the quality and accuracy of references provided by ChatGPT, the researchers point out.

Source: McGill University Health Centre

Dr Robot Will See You Now: Medical Chatbots Need to be Regulated

Photo by Alex Knight on Unsplash

The Large Language Models (LLM) used in chatbots may appear to offer reliable, persuasive advice in a format which mimics conversation but in they can offer potentially harmful information when prompted with medical questions. Therefore, any LLM-chatbot in a medical setting would require approval as a medical device, argue experts in a paper published in Nature Medicine.

The mistake often made with LLM-chatbots is that they are a true “artificial intelligence” when in fact they are more closely related to the predictive text in a smartphone. They mostly use conversations and text scraped from the internet, and use algorithms to associate words and sentences in a manner that appears meaningful.

“Large Language Models are neural network language models with remarkable conversational skills. They generate human-like responses and engage in interactive conversations. However, they often generate highly convincing statements that are verifiably wrong or provide inappropriate responses. Today there is no way to be certain about the quality, evidence level, or consistency of clinical information or supporting evidence for any response. These chatbots are unsafe tools when it comes to medical advice and it is necessary to develop new frameworks that ensure patient safety,” said Prof Stephen Gilbert at TU Dresden.

Challenges in the regulatory approval of LLMs

Most people research their symptoms online before seeking medical advice. Search engines play a role in decision-making process. The forthcoming integration of LLM-chatbots into search engines may increase users’ confidence in the answers given by a chatbot that mimics conversation. It has been demonstrated that LLMs can provide profoundly dangerous information when prompted with medical questions.

The basis of LLMs do not have any medical “ground truth,” which is inherently dangerous. Chat-interfaced LLMs have already provided harmful medical responses and have already been used unethically in ‘experiments’ on patients without consent. Almost every medical LLM use case requires regulatory control in the EU and US. In the US their lack of explainability disqualifies them from being ‘non devices’. LLMs with explainability, low bias, predictability, correctness, and verifiable outputs do not currently exist and they are not exempted from current (or future) governance approaches.

The authors describe in their paper the limited scenarios in which LLMs could find application under current frameworks. They also describe how developers can seek to create LLM-based tools that could be approved as medical devices, and they explore the development of new frameworks that preserve patient safety. “Current LLM-chatbots do not meet key principles for AI in healthcare, like bias control, explainability, systems of oversight, validation and transparency. To earn their place in medical armamentarium, chatbots must be designed for better accuracy, with safety and clinical efficacy demonstrated and approved by regulators,” concludes Prof Gilbert.

Source: Technische Universität Dresden

ChatGPT can Now (Almost) Pass the US Medical Licensing Exam

Photo by Maximalfocus on Unsplash

ChatGPT can score at or around the approximately 60% pass mark for the United States Medical Licensing Exam (USMLE), with responses that make coherent, internal sense and contain frequent insights, according to a study published in PLOS Digital Health by Tiffany Kung, Victor Tseng, and colleagues at AnsibleHealth.

ChatGPT is a new artificial intelligence (AI) system, known as a large language model (LLM), designed to generate human-like writing by predicting upcoming word sequences. Unlike most chatbots, ChatGPT cannot search the internet. Instead, it generates text using word relationships predicted by its internal processes.

Kung and colleagues tested ChatGPT’s performance on the USMLE, a highly standardised and regulated series of three exams (Steps 1, 2CK, and 3) required for medical licensure in the United States. Taken by medical students and physicians-in-training, the USMLE assesses knowledge spanning most medical disciplines, ranging from biochemistry, to diagnostic reasoning, to bioethics.

After screening to remove image-based questions, the authors tested the software on 350 of the 376 public questions available from the June 2022 USMLE release. 

After indeterminate responses were removed, ChatGPT scored between 52.4% and 75.0% across the three USMLE exams. The passing threshold each year is approximately 60%. ChatGPT also demonstrated 94.6% concordance across all its responses and produced at least one significant insight (something that was new, non-obvious, and clinically valid) for 88.9% of its responses. Notably, ChatGPT exceeded the performance of PubMedGPT, a counterpart model trained exclusively on biomedical domain literature, which scored 50.8% on an older dataset of USMLE-style questions.

While the relatively small input size restricted the depth and range of analyses, the authors note their findings provide a glimpse of ChatGPT’s potential to enhance medical education, and eventually, clinical practice. For example, they add, clinicians at AnsibleHealth already use ChatGPT to rewrite jargon-heavy reports for easier patient comprehension.

“Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation,” say the authors.

Author Dr Tiffany Kung added that ChatGPT’s role in this research went beyond being the study subject: “ChatGPT contributed substantially to the writing of [our] manuscript… We interacted with ChatGPT much like a colleague, asking it to synthesise, simplify, and offer counterpoints to drafts in progress…All of the co-authors valued ChatGPT’s input.”

Source: EurekAlert!