Dr Robot Will See You Now: Medical Chatbots Need to be Regulated

Photo by Alex Knight on Unsplash

The Large Language Models (LLM) used in chatbots may appear to offer reliable, persuasive advice in a format which mimics conversation but in they can offer potentially harmful information when prompted with medical questions. Therefore, any LLM-chatbot in a medical setting would require approval as a medical device, argue experts in a paper published in Nature Medicine.

The mistake often made with LLM-chatbots is that they are a true “artificial intelligence” when in fact they are more closely related to the predictive text in a smartphone. They mostly use conversations and text scraped from the internet, and use algorithms to associate words and sentences in a manner that appears meaningful.

“Large Language Models are neural network language models with remarkable conversational skills. They generate human-like responses and engage in interactive conversations. However, they often generate highly convincing statements that are verifiably wrong or provide inappropriate responses. Today there is no way to be certain about the quality, evidence level, or consistency of clinical information or supporting evidence for any response. These chatbots are unsafe tools when it comes to medical advice and it is necessary to develop new frameworks that ensure patient safety,” said Prof Stephen Gilbert at TU Dresden.

Challenges in the regulatory approval of LLMs

Most people research their symptoms online before seeking medical advice. Search engines play a role in decision-making process. The forthcoming integration of LLM-chatbots into search engines may increase users’ confidence in the answers given by a chatbot that mimics conversation. It has been demonstrated that LLMs can provide profoundly dangerous information when prompted with medical questions.

The basis of LLMs do not have any medical “ground truth,” which is inherently dangerous. Chat-interfaced LLMs have already provided harmful medical responses and have already been used unethically in ‘experiments’ on patients without consent. Almost every medical LLM use case requires regulatory control in the EU and US. In the US their lack of explainability disqualifies them from being ‘non devices’. LLMs with explainability, low bias, predictability, correctness, and verifiable outputs do not currently exist and they are not exempted from current (or future) governance approaches.

The authors describe in their paper the limited scenarios in which LLMs could find application under current frameworks. They also describe how developers can seek to create LLM-based tools that could be approved as medical devices, and they explore the development of new frameworks that preserve patient safety. “Current LLM-chatbots do not meet key principles for AI in healthcare, like bias control, explainability, systems of oversight, validation and transparency. To earn their place in medical armamentarium, chatbots must be designed for better accuracy, with safety and clinical efficacy demonstrated and approved by regulators,” concludes Prof Gilbert.

Source: Technische Universität Dresden