Tag: artificial intelligence

Is AI a Better Doctors’ Diagnostic Resource than Traditional Ones?

With hospitals already deploying artificial intelligence (AI) to improve patient care, a new study has found that using Chat GPT Plus does not significantly improve the accuracy of doctors’ diagnoses when compared with the use of usual resources. 

The study, from UVA Health’s Andrew S. Parsons, MD, MPH and colleagues, enlisted 50 physicians in family medicine, internal medicine and emergency medicine to put Chat GPT Plus to the test. Half were randomly assigned to use Chat GPT Plus to diagnose complex cases, while the other half relied on conventional methods such as medical reference sites (for example, UpToDate©) and Google. The researchers then compared the resulting diagnoses, finding that the accuracy across the two groups was similar.

That said, Chat GPT alone outperformed both groups, suggesting that it still holds promise for improving patient care. Physicians, however, will need more training and experience with the emerging technology to capitalise on its potential, the researchers conclude. 

For now, Chat GPT remains best used to augment, rather than replace, human physicians, the researchers say.

“Our study shows that AI alone can be an effective and powerful tool for diagnosis,” said Parsons, who oversees the teaching of clinical skills to medical students at the University of Virginia School of Medicine and co-leads the Clinical Reasoning Research Collaborative. “We were surprised to find that adding a human physician to the mix actually reduced diagnostic accuracy though improved efficiency. These results likely mean that we need formal training in how best to use AI.”

Chat GPT for Disease Diagnosis

Chatbots called “large language models” that produce human-like responses are growing in popularity, and they have shown impressive ability to take patient histories, communicate empathetically and even solve complex medical cases. But, for now, they still require the involvement of a human doctor. 

Parsons and his colleagues were eager to determine how the high-tech tool can be used most effectively, so they launched a randomized, controlled trial at three leading-edge hospitals – UVA Health, Stanford and Harvard’s Beth Israel Deaconess Medical Center.

The participating docs made diagnoses for “clinical vignettes” based on real-life patient-care cases. These case studies included details about patients’ histories, physical exams and lab test results. The researchers then scored the results and examined how quickly the two groups made their diagnoses. 

The median diagnostic accuracy for the docs using Chat GPT Plus was 76.3%, while the results for the physicians using conventional approaches was 73.7%. The Chat GPT group members reached their diagnoses slightly more quickly overall – 519 seconds compared with 565 seconds.

The researchers were surprised at how well Chat GPT Plus alone performed, with a median diagnostic accuracy of more than 92%. They say this may reflect the prompts used in the study, suggesting that physicians likely will benefit from training on how to use prompts effectively. Alternately, they say, healthcare organisations could purchase predefined prompts to implement in clinical workflow and documentation.

The researchers also caution that Chat GPT Plus likely would fare less well in real life, where many other aspects of clinical reasoning come into play – especially in determining downstream effects of diagnoses and treatment decisions. They’re urging additional studies to assess large language models’ abilities in those areas and are conducting a similar study on management decision-making. 

“As AI becomes more embedded in healthcare, it’s essential to understand how we can leverage these tools to improve patient care and the physician experience,” Parsons said. “This study suggests there is much work to be done in terms of optimising our partnership with AI in the clinical environment.”

Following up on this groundbreaking work, the four study sites have also launched a bicoastal AI evaluation network called ARiSE (AI Research and Science Evaluation) to further evaluate GenAI outputs in healthcare. Find out more information at the ARiSE website.

Source: University of Virginia Health System

Researchers Find Persistent Problems with AI-assisted Genomic Studies

Photo by Sangharsh Lohakare on Unsplash

In a paper published in Nature Genetics, researchers are warning that artificial intelligence tools gaining popularity in the fields of genetics and medicine can lead to flawed conclusions about the connection between genes and physical characteristics, including risk factors for diseases like diabetes.

The faulty predictions are linked to researchers’ use of AI to assist genome-wide association studies, according to the University of Wisconsin–Madison researchers. Such studies scan through hundreds of thousands of genetic variations across many people to hunt for links between genes and physical traits. Of particular interest are possible connections between genetic variations and certain diseases.

Genetics’ link to disease not always straightforward

Genetics play a role in the development of many health conditions. While changes in some individual genes are directly connected to an increased risk for diseases like cystic fibrosis, the relationship between genetics and physical traits is often more complicated.

Genome-wide association studies have helped to untangle some of these complexities, often using large databases of individuals’ genetic profiles and health characteristics, such as the National Institutes of Health’s All of Us project and the UK Biobank. However, these databases are often missing data about health conditions that researchers are trying to study.

“Some characteristics are either very expensive or labour-intensive to measure, so you simply don’t have enough samples to make meaningful statistical conclusions about their association with genetics,” says Qiongshi Lu, an associate professor in the UW–Madison Department of Biostatistics and Medical Informatics and an expert on genome-wide association studies.

The risks of bridging data gaps with AI

Researchers are increasingly attempting to work around this problem by bridging data gaps with ever more sophisticated AI tools.

“It has become very popular in recent years to leverage advances in machine learning, so we now have these advanced machine-learning AI models that researchers use to predict complex traits and disease risks with even limited data,” Lu says.

Now, Lu and his colleagues have demonstrated the peril of relying on these models without also guarding against biases they may introduce. In their paper, they show that a common type of machine learning algorithm employed in genome-wide association studies can mistakenly link several genetic variations with an individual’s risk for developing Type 2 diabetes.

“The problem is if you trust the machine learning-predicted diabetes risk as the actual risk, you would think all those genetic variations are correlated with actual diabetes even though they aren’t,” says Lu.

These “false positives” are not limited to these specific variations and diabetes risk, Lu adds, but are a pervasive bias in AI-assisted studies.

New statistical method can reduce false positives

In addition to identifying the problem with overreliance on AI tools, Lu and his colleagues propose a statistical method that researchers can use to guarantee the reliability of their AI-assisted genome-wide association studies. The method helps remove bias that machine learning algorithms can introduce when they’re making inferences based on incomplete information.

“This new strategy is statistically optimal,” Lu says, noting that the team used it to better pinpoint genetic associations with individuals’ bone mineral density.

AI not the only problem with some genome-wide association studies

While the group’s proposed statistical method could help improve the accuracy of AI-assisted studies, Lu and his colleagues also recently identified problems with similar studies that fill data gaps with proxy information rather than algorithms.

In another recently published paper appearing in Nature Genetics, the researchers sound the alarm about studies that over-rely on proxy information in an attempt to establish connections between genetics and certain diseases.

For instance, large health databases like the UK Biobank have a ton of genetic information about large populations, but they don’t have very much data regarding the incidence of diseases that tend to crop up later in life, like most neurodegenerative diseases.

For Alzheimer’s disease specifically, some researchers have attempted to bridge that gap with proxy data gathered through family health history surveys, where individuals can report a parent’s Alzheimer’s diagnosis.

The UW–Madison team found that such proxy-information studies can produce “highly misleading genetic correlation” between Alzheimer’s risk and higher cognitive abilities.

“These days, genomic scientists routinely work with biobank datasets that have hundreds of thousands of individuals; however, as statistical power goes up, biases and the probability of errors are also amplified in these massive datasets,” says Lu. “Our group’s recent studies provide humbling examples and highlight the importance of statistical rigor in biobank-scale research studies.”

Source: University of Wisconsin-Madison

AI Eye to Eye with Ophthalmologists in Diagnosing Corneal Infections

Photo by Victor Freitas on Pexels

A Birmingham-led study has found that AI-powered models match ophthalmologists in diagnosing infectious keratitis, offering promise for global eye care improvements.

Infectious keratitis (IK) is a leading cause of corneal blindness worldwide. This new study finds that deep learning models showed similar levels of accuracy in identifying infection.

In a meta-analysis study published in eClinicalMedicine, Dr Darren Ting from the University of Birmingham conducted a review with a global team of researchers analysing 35 studies that utilised Deep Learning (DL) models to diagnose infectious keratitis.

AI models in the study matched the diagnostic accuracy of ophthalmologists, exhibiting a sensitivity of 89.2% and specificity of 93.2%, compared to ophthalmologists’ 82.2% sensitivity and 89.6% specificity.

The models in the study had analysed a combined total of more than 136 000 corneal images, and the authors say that the results further demonstrate the potential use of artificial intelligence in clinical settings.

Dr Darren Ting, Senior author of the study, Birmingham Health Partners (BHP) Fellow and Consultant Ophthalmologist, University of Birmingham said: “Our study shows that AI has the potential to provide fast, reliable diagnoses, which could revolutionise how we manage corneal infections globally. This is particularly promising for regions where access to specialist eye care is limited, and can help to reduce the burden of preventable blindness worldwide.”

The AI models also proved effective at differentiating between healthy eyes, infected corneas, and the various underlying causes of IK, such as bacterial or fungal infections.

While these results highlight the potential of DL in healthcare, the study’s authors emphasised the need for more diverse data and further external validation to increase the reliability of these models for clinical use.

Infectious keratitis, an inflammation of the cornea, affects millions, particularly in low- and middle-income countries where access to specialist eye care is limited. As AI technology continues to grow and play a pivotal role in medicine, it may soon become a key tool in preventing corneal blindness globally.

Source: University of Birmingham

AI Tools Can’t Revolutionise Public Health if They Stick to Old Patterns

As tools powered by artificial intelligence increasingly make their way into health care, the latest research from UC Santa Cruz Politics Department doctoral candidate Lucia Vitale takes stock of the current landscape of promises and anxieties. 

Proponents of AI envision the technology helping to manage health care supply chains, monitor disease outbreaks, make diagnoses, interpret medical images, and even reduce equity gaps in access to care by compensating for healthcare worker shortages. But others are sounding the alarm about issues like privacy rights, racial and gender biases in models, lack of transparency in AI decision-making processes that could lead to patient care mistakes, and even the potential for insurance companies to use AI to discriminate against people with poor health. 

Which types of impacts these tools ultimately have will depend upon the manner in which they are developed and deployed. In a paper for the journal Social Science & Medicine, Vitale and her coauthor, University of British Columbia doctoral candidate Leah Shipton, conducted an extensive literature analysis of AI’s current trajectory in health care. They argue that AI is positioned to become the latest in a long line of technological advances that ultimately have limited impact because they engage in a “politics of avoidance” that diverts attention away from, or even worsens, more fundamental structural problems in global public health. 

For example, like many technological interventions of the past, most AI being developed for health focuses on treating disease, while ignoring the underlying determinants of health. Vitale and Shipton fear that the hype over unproven AI tools could distract from the urgent need to implement low-tech but evidence-based holistic interventions, like community health workers and harm reduction programs. 

“We have seen this pattern before,” Vitale said. “We keep investing in these tech silver bullets that fail to actually change public health because they’re not dealing with the deeply rooted political and social determinants of health, which can range from things like health policy priorities to access to healthy foods and a safe place to live.”

AI is also likely to continue or exacerbate patterns of harm and exploitation that have historically been common in the biopharmaceutical industry. One example discussed in the paper is that the ownership of and profit from AI is currently concentrated in high-income countries, while low- to middle-income countries with weak regulations may be targeted for data extraction or experimentation with the deployment of potentially risky new technologies. 

The paper also predicts that lax regulatory approaches to AI will continue the prioritization of intellectual property rights and industry incentives over equitable and affordable public access to new treatments and tools. And since corporate profit motives will continue to drive product development, AI companies are also likely to follow the health technology sector’s long-term trend of overlooking the needs of the world’s poorest people when deciding which issues to target for investment in research and development. 

However, Vitale and Shipton did identify a bright spot. AI could potentially break the mold and create a deeper impact by focusing on improving the health care system itself. AI could be used to allocate resources more efficiently across hospitals and for more effective patient triage. Diagnostic tools could improve the efficiency and expand the capabilities of general practitioners in small rural hospitals without specialists. AI could even provide some basic yet essential health services to fill labor and specialization gaps, like providing prenatal check-ups in areas with growing maternity care deserts. 

All of these applications could potentially result in more equitable access to care. But that result is far from guaranteed. Depending on how and where these technologies are deployed, they could either successfully backfill gaps in care where there are genuine health worker shortages or lead to unemployment or precarious gig work for existing health care workers. And unless the underlying causes of health care worker shortages are addressed – including burnout and “brain drain” to high-income countries – AI tools could end up providing diagnosis or outbreak detection that is ultimately not useful because communities still lack the capacity to respond. 

To maximise benefits and minimise harms, Vitale and Shipton argue that regulation must be put in place before AI expands further into the health sector. The right safeguards could help to divert AI from following harmful patterns of the past and instead chart a new path that ensures future projects will align with the public interest.

“With AI, we have an opportunity to correct our way of governing new technologies,” Shipton said. “But we need a clear agenda and framework for the ethical governance of AI health technologies through the World Health Organization, major public-private partnerships that fund and deliver health interventions, and countries like the United States, India, and China that host tech companies. Getting that implemented is going to require continued civil society advocacy.”

Source: University of California – Santa Cruz

AI-enabled ‘Digital Stethoscope’ can Diagnose Peripartum Cardiomyopathy Twice as Often

Source: CC0

New research from Mayo Clinic suggests that artificial intelligence (AI) could improve the diagnosis of peripartum cardiomyopathy, a potentially life-threatening and treatable condition that weakens the heart muscle of women during pregnancy or in the months after giving birth. Researchers used an AI-enabled digital stethoscope that captures electrocardiogram (ECG) data and heart sounds to identify twice as many cases of peripartum cardiomyopathy as compared to regular care, according to a news release from the American Heart Association.

Identifying a weak heart pump caused by pregnancy is important because the symptoms, such as shortness of breath when lying down, swelling of hands and feet, weight gain, and rapid heartbeat, can be confused with normal symptoms of pregnancy.

Dr Demilade Adedinsewo, a cardiologist at Mayo Clinic, shared research insights during a late-breaking science presentation at the American Heart Association’s Scientific Sessions 2023.

Women in Nigeria have the highest reported incidence of peripartum cardiomyopathy. The randomised pragmatic clinical trial enrolled 1195 women receiving pregnancy care in Nigeria. Approximately half were evaluated with AI-guided screening using the digital stethoscope, and half received usual obstetric care in addition to a clinical ECG. An echocardiogram was used to confirm when the AI-enabled digital stethoscope predicted peripartum cardiomyopathy. Overall, 4% of the pregnant and postpartum women in the intervention arm of the clinical trial had cardiomyopathy compared to 2% in the control arm, suggesting that half are likely undetected with usual care.

Watch: Dr Adedinsewo explains the red flags for heart failure during pregnancy

Source: Mayo Clinic

Applying AI to EHRs Ensures Better Outcomes and Insights

Photo by Christina Morillo

This week the GIBS, (Gordon Institute of Business Science), held an on-campus Healthcare Industry Insights Conference aimed at healthcare professionals and others with an interest in this field to hear from experts providing insightful discussion and frank debate. 

The sessions were each themed to different topics such as Innovation for Sustainable Access and Quality Care, Building a Skilled Workforce, navigating Public-Private Partnerships and Addressing Social Determinants. 

The day ended with a focus on Digital Transformation and advances in medical device manufacturing, were discussed. 

Dilip Naran, Vice President of Product Architecture at CompuGroup Medical South Africa, (an internationally leading MedTech provider), has over 25 years of dedicated service to the South African healthcare market, and was asked to share his thoughts on the next generation of digital health. 

Naran has been actively involved in shaping both billing and clinical applications and has been a key player in the creation of cutting-edge cloud-based solutions that have revolutionised the way healthcare professionals operate in South Africa. 

Improving workflow processes

The discussion focused on the AI and Electronic Health Records (EHRs), and how by harnessing the power of AI, healthcare providers can unlock unprecedented insights, enhance patient care and drive operational efficiencies.

The topical subject began by reminding the audience that AI has already improved the EHR data management. By extracting valuable insights from clinical notes, automation of repetitive tasks, analysing data to identify patterns and facilitating the seamless integration of multiple data sources. AI advances in HER and medical devices have reshaped the doctor / patient healthcare journey. 

To continue this growth, AI powered tools must be implemented in EHRs to enable functionality that enhance the Dr/Patient journey. Some benefits of AI powered EHRs include: 

  • Effective Clinical Decision Support 
  • Intelligent Automation. This includes improvement in workflow by automating certain tasks 
  • Smart Medication management . Ai can alert HCP to potential drug interactions and adverse effects 
  • Predictive Analytics that are personalised based on patient history 

Adoption in South Africa

Whilst some of the AI technologies are not yet available in South Africa, CGM’s recently launched Autoscriber solution which uses AI technologies such as Natural Language Processing NLP and a Large Language Model (LLM) has enabled South African HCPs to use this solution to create structured notes which includes diagnoses ICD10 and SNOMED coding. This assists the HCP in populating their HER without having to physically capture information. 

At the moment the adoption rate of EHR in practices is around 30% in the private sector, with oncology leading the way. 

With collaboration between government, private and public sector, existing technologies can forecast disease outbreaks, identify high-risk patients and optimise resource allocation. 

Dilip Naran concluded by saying: “The use of AI technologies and processes can facilitate the meaningful use of data in EHRs and lead to better patient outcomes” 

More Often than Not, Hospital Pneumonia Diagnoses are Revised

Photo by engin akyurt on Unsplash

Pneumonia diagnoses are marked by pronounced uncertainty, according to an AI-based analysis of over 2 million hospital visits. The study, published in Annals of Internal Medicine, found that more than half the time, a pneumonia diagnosis made in the hospital will change from a patient’s entrance to their discharge – either because someone who was initially diagnosed with pneumonia ended up with a different final diagnosis, or because a final diagnosis of pneumonia was missed when a patient entered the hospital (not including cases of hospital-acquired pneumonia).

Understanding that uncertainty could help improve care by prompting doctors to continue to monitor symptoms and adapt treatment accordingly, even after an initial diagnosis. 

Barbara Jones, MD, pulmonary and critical care physician at University of Utah Health and the first author on the study, found the results by searching medical records from more than 100 VA medical centres across the country, using AI-based tools to identify mismatches between initial diagnoses and diagnoses upon discharge from the hospital. More than 10% of all such visits involved a pneumonia diagnosis, either when a patient entered the hospital, when they left, or both.

“Pneumonia can seem like a clear-cut diagnosis,” Jones says, “but there is actually quite a bit of overlap with other diagnoses that can mimic pneumonia.” A third of patients who were ultimately diagnosed with pneumonia did not receive a pneumonia diagnosis when they entered the hospital. And almost 40% of initial pneumonia diagnoses were later revised.

The study also found that this uncertainty was often evident in doctors’ notes on patient visits; clinical notes on pneumonia diagnoses in the emergency department expressed uncertainty more than half the time (58%), and notes on diagnosis at discharge expressed uncertainty almost half the time (48%). Simultaneous treatments for multiple potential diagnoses were also common.

When the initial diagnosis was pneumonia, but the discharge diagnosis was different, patients tended to receive a greater number of treatments in the hospital, but didn’t do worse than other patients as a general rule. However, patients who initially lacked a pneumonia diagnosis, but ultimately ended up diagnosed with pneumonia, had worse health outcomes than other patients.

A path forward

The new results call into question much of the existing research on pneumonia treatment, which tends to assume that initial and discharge diagnoses will be the same. Jones adds that doctors and patients should keep this high level of uncertainty in mind after an initial pneumonia diagnosis and be willing to adapt to new information throughout the treatment process. “Both patients and clinicians need to pay attention to their recovery and question the diagnosis if they don’t get better with treatment,” she says.

Source: University of Utah

Less Invasive Method for Measuring Intracranial Pressure After TBI

Coup and contrecoup brain injury. Credit: Scientific Animations CC4.0

Researchers at Johns Hopkins explored a potential alternative and less-invasive approach to evaluate intracranial pressure (ICP) in patients with serious neurological conditions. This research, using artificial intelligence (AI) to analyse routinely captured ICU data, was published in Computers in Biology and Medicine.

ICP is a physiological variable that can increase abnormally if one has severe traumatic brain injury, stroke or obstruction to the flow of cerebrospinal fluid. Symptoms of elevated ICP may include headaches, blurred vision, vomiting, changes in behaviour and decreased level of consciousness. It can be life-threatening, hence the need for ICP monitoring in selected patients who are at increased risk. But the current standard for ICP monitoring is highly invasive: it requires the placement of an external ventricular drain (EVD) or an intraparenchymal brain monitor (IPM) in the functional tissue in the brain consisting of neurons and glial cells by drilling through the skull.

“ICP is universally accepted as a critical vital sign – there is an imperative need to measure and treat ICP in patients with serious neurological disorders, yet the current standard for ICP measurement is invasive, risky, and resource-intensive. Here we explored a novel approach leveraging Artificial Intelligence which we believed could represent a viable noninvasive alternative ICP assessment method,” says senior author Robert Stevens, MD, MBA, associate professor of anaesthesiology and critical care medicine.

EVD procedures carry a number of risks including catheter misplacement, infection, and haemorrhaging at 15.3 %, 5.8 %, and 12.1 %, respectively, according to recent research. EVD and IPM procedures also require surgical expertise and specialised equipment that is not consistently available in many settings thus underscoring the need for an alternative method in examining and monitoring ICP in patients.

The Johns Hopkins team, a group that included faculty and students from the School of Medicine and Whiting School of Engineering, hypothesised that severe forms of brain injury, and elevations in ICP in particular, are associated with pathological changes in systemic cardiocirculatory function due, for example, to dysregulation of the central autonomic nervous system. This hypothesis suggests that extracranial physiological waveforms can be studied to better understand brain activity and ICP severity.

In this study, the Johns Hopkins team set out to explore the relationship between the ICP waveform and the three physiological waveforms that are routinely captured in the ICU: invasive arterial blood pressure (ABP), photoplethysmography (PPG) and electrocardiography (ECG). ABP, PPG and ECG data were used to train deep learning algorithms, resulting in a level of accuracy in determining ICP that rivals or exceeds other methodologies.

Overall study findings suggest a completely new, noninvasive alternative to monitor ICP in patients.

Stevens says, “with validation, physiology-based AI solutions, such as the one used here, could significantly expand the proportion of patients and health care settings in which ICP monitoring and management can be delivered.” 

Source: John Hopkins Medicine

AI Models that can Identify Patient Demographics in X-rays are Also Unfair

Photo by Anna Shvets

Artificial intelligence models often play a role in medical diagnoses, especially when it comes to analysing images such as X-rays. But these models have been found not perform as well across all demographic groups, usually faring worse on women and people of colour.

These models have also been shown to develop some surprising abilities. In 2022, MIT researchers reported that AI models can make accurate predictions about a patient’s race from their chest X-rays – something that the most skilled radiologists can’t do.

Now, in a new study appearing in Nature, the same research team has found that the models that are most accurate at making demographic predictions also show the biggest “fairness gaps”, ie having reduced accuracy diagnosing images of people of different races or genders. The findings suggest that these models may be using “demographic shortcuts” when making their diagnostic evaluations, which lead to incorrect results for women, Black people, and other groups, the researchers say.

“It’s well-established that high-capacity machine-learning models are good predictors of human demographics such as self-reported race or sex or age. This paper re-demonstrates that capacity, and then links that capacity to the lack of performance across different groups, which has never been done,” says senior author Marzyeh Ghassemi, an MIT associate professor of electrical engineering and computer science.

The researchers also found that they could retrain the models in a way that improves their fairness. However, their approached to “debiasing” worked best when the models were tested on the same types of patients they were trained on, such as patients from the same hospital. When these models were applied to patients from different hospitals, the fairness gaps reappeared.

“I think the main takeaways are, first, you should thoroughly evaluate any external models on your own data because any fairness guarantees that model developers provide on their training data may not transfer to your population. Second, whenever sufficient data is available, you should train models on your own data,” says Haoran Zhang, an MIT graduate student and one of the lead authors of the new paper.

Removing bias

As of May 2024, the FDA has approved 882 AI-enabled medical devices, with 671 of them designed to be used in radiology. Since 2022, when Ghassemi and her colleagues showed that these diagnostic models can accurately predict race, they and other researchers have shown that such models are also very good at predicting gender and age, even though the models are not trained on those tasks.

“Many popular machine learning models have superhuman demographic prediction capacity – radiologists cannot detect self-reported race from a chest X-ray,” Ghassemi says. “These are models that are good at predicting disease, but during training are learning to predict other things that may not be desirable.”

In this study, the researchers set out to explore why these models don’t work as well for certain groups. In particular, they wanted to see if the models were using demographic shortcuts to make predictions that ended up being less accurate for some groups. These shortcuts can arise in AI models when they use demographic attributes to determine whether a medical condition is present, instead of relying on other features of the images.

Using publicly available chest X-ray datasets from Beth Israel Deaconess Medical Center (BIDMC) in Boston, the researchers trained models to predict whether patients had one of three different medical conditions: fluid buildup in the lungs, collapsed lung, or enlargement of the heart. Then, they tested the models on X-rays that were held out from the training data.

Overall, the models performed well, but most of them displayed “fairness gaps” – that is, discrepancies between accuracy rates for men and women, and for white and Black patients.

The models were also able to predict the gender, race, and age of the X-ray subjects. Additionally, there was a significant correlation between each model’s accuracy in making demographic predictions and the size of its fairness gap. This suggests that the models may be using demographic categorisations as a shortcut to make their disease predictions.

The researchers then tried to reduce the fairness gaps using two types of strategies. For one set of models, they trained them to optimise “subgroup robustness,” meaning that the models are rewarded for having better performance on the subgroup for which they have the worst performance, and penalised if their error rate for one group is higher than the others.

In another set of models, the researchers forced them to remove any demographic information from the images, using “group adversarial” approaches. Both strategies worked fairly well, the researchers found.

“For in-distribution data, you can use existing state-of-the-art methods to reduce fairness gaps without making significant trade-offs in overall performance,” Ghassemi says. “Subgroup robustness methods force models to be sensitive to mispredicting a specific group, and group adversarial methods try to remove group information completely.”

Not always fairer

However, those approaches only worked when the models were tested on data from the same types of patients that they were trained on, eg from BIDMC.

When the researchers tested the models that had been “debiased” using the BIDMC data to analyse patients from five other hospital datasets, they found that the models’ overall accuracy remained high, but some of them exhibited large fairness gaps.

“If you debias the model in one set of patients, that fairness does not necessarily hold as you move to a new set of patients from a different hospital in a different location,” Zhang says.

This is worrisome because in many cases, hospitals use models that have been developed on data from other hospitals, especially in cases where an off-the-shelf model is purchased, the researchers say.

“We found that even state-of-the-art models which are optimally performant in data similar to their training sets are not optimal – that is, they do not make the best trade-off between overall and subgroup performance – in novel settings,” Ghassemi says. “Unfortunately, this is actually how a model is likely to be deployed. Most models are trained and validated with data from one hospital, or one source, and then deployed widely.”

The researchers found that the models that were debiased using group adversarial approaches showed slightly more fairness when tested on new patient groups than those debiased with subgroup robustness methods. They now plan to try to develop and test additional methods to see if they can create models that do a better job of making fair predictions on new datasets.

The findings suggest that hospitals that use these types of AI models should evaluate them on their own patient population before beginning to use them, to make sure they aren’t giving inaccurate results for certain groups.

Using AI, Scientists Discover High-risk Form of Endometrial Cancer

Dr Ali Bashashati observes an endometrial cancer sample on a microscope slide. Credit: University of British Columbia

A discovery by researchers at the University of British Columbia promises to improve care for patients with endometrial cancer, the most common gynaecologic malignancy.  Using artificial intelligence (AI) to spot patterns across thousands of cancer cell images, the researchers have pinpointed a distinct subset of more stubborn endometrial cancer that would otherwise go unrecognised by traditional pathology and molecular diagnostics.

The findings, published in Nature Communications, will help doctors identify patients with high-risk disease who could benefit from more comprehensive treatment.

“Endometrial cancer is a diverse disease, with some patients much more likely to see their cancer return than others,” said Dr Jessica McAlpine, professor at UBC. “It’s so important that patients with high-risk disease are identified so we can intervene and hopefully prevent recurrence. This AI-based approach will help ensure no patient misses an opportunity for potentially lifesaving interventions.”

AI-powered precision medicine

The discovery builds on work by Dr McAlpine and colleagues in the Gynaecologic Cancer Initiative, who in 2013 helped show that endometrial cancer can be classified into four subtypes based on the molecular characteristics of cancerous cells, with each posing a different level of risk to patients.

Dr McAlpine and team then went on to develop an innovative molecular diagnostic tool, called ProMiSE, that can accurately discern between the subtypes. The tool is now used across parts of Canada and internationally to guide treatment decisions.

Yet, challenges remain. The most prevalent molecular subtype, encompassing approximately 50% of all cases, is largely a catch-all category for endometrial cancers lacking discernible molecular features.

“There are patients in this very large category who have extremely good outcomes, and others whose cancer outcomes are highly unfavourable. But until now, we have lacked the tools to identify those at-risk so that we can offer them appropriate treatment,” said Dr McAlpine.

Dr McAlpine turned to long-time collaborator and machine learning expert Dr.Ali Bashashati, an assistant professor of biomedical engineering and pathology and laboratory medicine at UBC, to try and further segment the category using advanced AI methods.

Dr Bashashati and his team developed a deep learning AI model that analyses images of tissue samples collected from patients. The AI was trained to differentiate between different subtypes, and after analysing over 2300 cancer tissue images, pinpointed the new subgroup that exhibited markedly inferior survival rates.

“The power of AI is that it can objectively look at large sets of images and identify patterns that elude human pathologists,” said Dr Bashashati. “It’s finding the needle in the haystack. It tells us this group of cancers with these characteristics are the worst offenders and represent a higher risk for patients.”

Bringing the discovery to patients

The team is now exploring how the AI tool could be integrated into clinical practice alongside traditional molecular and pathology diagnostics.

“The two work hand-in-hand, with AI providing an additional layer on top of the testing we’re already doing,” said Dr McAlpine.

One benefit of the AI-based approach is that it’s cost-efficient and easy to deploy across geographies. The AI analyses images that are routinely gathered by pathologists and healthcare providers, even at smaller hospital sites in rural and remote communities, and shared when seeking second opinions on a diagnosis.

The combined use of molecular and AI-based analysis could allow many patients to remain in their home communities for less intensive surgery, while ensuring those who need treatment at a larger cancer centre can do so.  

“What is really compelling to us is the opportunity for greater equity and access,” said Dr Bashashati. “The AI doesn’t care if you’re in a large urban centre or rural community, it would just be available, so our hope is that this could really transform how we diagnose and treat endometrial cancer for patients everywhere.”

Source: University of British Columbia