Tag: X-rays

AI Models that can Identify Patient Demographics in X-rays are Also Unfair

Photo by Anna Shvets

Artificial intelligence models often play a role in medical diagnoses, especially when it comes to analysing images such as X-rays. But these models have been found not perform as well across all demographic groups, usually faring worse on women and people of colour.

These models have also been shown to develop some surprising abilities. In 2022, MIT researchers reported that AI models can make accurate predictions about a patient’s race from their chest X-rays – something that the most skilled radiologists can’t do.

Now, in a new study appearing in Nature, the same research team has found that the models that are most accurate at making demographic predictions also show the biggest “fairness gaps”, ie having reduced accuracy diagnosing images of people of different races or genders. The findings suggest that these models may be using “demographic shortcuts” when making their diagnostic evaluations, which lead to incorrect results for women, Black people, and other groups, the researchers say.

“It’s well-established that high-capacity machine-learning models are good predictors of human demographics such as self-reported race or sex or age. This paper re-demonstrates that capacity, and then links that capacity to the lack of performance across different groups, which has never been done,” says senior author Marzyeh Ghassemi, an MIT associate professor of electrical engineering and computer science.

The researchers also found that they could retrain the models in a way that improves their fairness. However, their approached to “debiasing” worked best when the models were tested on the same types of patients they were trained on, such as patients from the same hospital. When these models were applied to patients from different hospitals, the fairness gaps reappeared.

“I think the main takeaways are, first, you should thoroughly evaluate any external models on your own data because any fairness guarantees that model developers provide on their training data may not transfer to your population. Second, whenever sufficient data is available, you should train models on your own data,” says Haoran Zhang, an MIT graduate student and one of the lead authors of the new paper.

Removing bias

As of May 2024, the FDA has approved 882 AI-enabled medical devices, with 671 of them designed to be used in radiology. Since 2022, when Ghassemi and her colleagues showed that these diagnostic models can accurately predict race, they and other researchers have shown that such models are also very good at predicting gender and age, even though the models are not trained on those tasks.

“Many popular machine learning models have superhuman demographic prediction capacity – radiologists cannot detect self-reported race from a chest X-ray,” Ghassemi says. “These are models that are good at predicting disease, but during training are learning to predict other things that may not be desirable.”

In this study, the researchers set out to explore why these models don’t work as well for certain groups. In particular, they wanted to see if the models were using demographic shortcuts to make predictions that ended up being less accurate for some groups. These shortcuts can arise in AI models when they use demographic attributes to determine whether a medical condition is present, instead of relying on other features of the images.

Using publicly available chest X-ray datasets from Beth Israel Deaconess Medical Center (BIDMC) in Boston, the researchers trained models to predict whether patients had one of three different medical conditions: fluid buildup in the lungs, collapsed lung, or enlargement of the heart. Then, they tested the models on X-rays that were held out from the training data.

Overall, the models performed well, but most of them displayed “fairness gaps” – that is, discrepancies between accuracy rates for men and women, and for white and Black patients.

The models were also able to predict the gender, race, and age of the X-ray subjects. Additionally, there was a significant correlation between each model’s accuracy in making demographic predictions and the size of its fairness gap. This suggests that the models may be using demographic categorisations as a shortcut to make their disease predictions.

The researchers then tried to reduce the fairness gaps using two types of strategies. For one set of models, they trained them to optimise “subgroup robustness,” meaning that the models are rewarded for having better performance on the subgroup for which they have the worst performance, and penalised if their error rate for one group is higher than the others.

In another set of models, the researchers forced them to remove any demographic information from the images, using “group adversarial” approaches. Both strategies worked fairly well, the researchers found.

“For in-distribution data, you can use existing state-of-the-art methods to reduce fairness gaps without making significant trade-offs in overall performance,” Ghassemi says. “Subgroup robustness methods force models to be sensitive to mispredicting a specific group, and group adversarial methods try to remove group information completely.”

Not always fairer

However, those approaches only worked when the models were tested on data from the same types of patients that they were trained on, eg from BIDMC.

When the researchers tested the models that had been “debiased” using the BIDMC data to analyse patients from five other hospital datasets, they found that the models’ overall accuracy remained high, but some of them exhibited large fairness gaps.

“If you debias the model in one set of patients, that fairness does not necessarily hold as you move to a new set of patients from a different hospital in a different location,” Zhang says.

This is worrisome because in many cases, hospitals use models that have been developed on data from other hospitals, especially in cases where an off-the-shelf model is purchased, the researchers say.

“We found that even state-of-the-art models which are optimally performant in data similar to their training sets are not optimal – that is, they do not make the best trade-off between overall and subgroup performance – in novel settings,” Ghassemi says. “Unfortunately, this is actually how a model is likely to be deployed. Most models are trained and validated with data from one hospital, or one source, and then deployed widely.”

The researchers found that the models that were debiased using group adversarial approaches showed slightly more fairness when tested on new patient groups than those debiased with subgroup robustness methods. They now plan to try to develop and test additional methods to see if they can create models that do a better job of making fair predictions on new datasets.

The findings suggest that hospitals that use these types of AI models should evaluate them on their own patient population before beginning to use them, to make sure they aren’t giving inaccurate results for certain groups.

Is AI a Help or Hindrance to Radiologists? It’s Down to the Doctor

New research shows AI isn’t always a help for radiologists

Photo by Anna Shvets

One of the most touted promises of medical artificial intelligence tools is their ability to augment human clinicians’ performance by helping them interpret images such as X-rays and CT scans with greater precision to make more accurate diagnoses.

But the benefits of using AI tools on image interpretation appear to vary from clinician to clinician, according to new research led by investigators at Harvard Medical School, working with colleagues at MIT and Stanford.

The study findings suggest that individual clinician differences shape the interaction between human and machine in critical ways that researchers do not yet fully understand. The analysis, published in Nature Medicine, is based on data from an earlier working paper by the same research group released by the National Bureau of Economic Research.

In some instances, the research showed, use of AI can interfere with a radiologist’s performance and interfere with the accuracy of their interpretation.

“We find that different radiologists, indeed, react differently to AI assistance – some are helped while others are hurt by it,” said co-senior author Pranav Rajpurkar, assistant professor of biomedical informatics in the Blavatnik Institute at HMS.

“What this means is that we should not look at radiologists as a uniform population and consider just the ‘average’ effect of AI on their performance,” he said. “To maximize benefits and minimize harm, we need to personalize assistive AI systems.”

The findings underscore the importance of carefully calibrated implementation of AI into clinical practice, but they should in no way discourage the adoption of AI in radiologists’ offices and clinics, the researchers said.

Instead, the results should signal the need to better understand how humans and AI interact and to design carefully calibrated approaches that boost human performance rather than hurt it.

“Clinicians have different levels of expertise, experience, and decision-making styles, so ensuring that AI reflects this diversity is critical for targeted implementation,” said Feiyang “Kathy” Yu, who conducted the work while at the Rajpurkar lab with co-first author on the paper with Alex Moehring at the MIT Sloan School of Management.

“Individual factors and variation would be key in ensuring that AI advances rather than interferes with performance and, ultimately, with diagnosis,” Yu said.

AI tools affected different radiologists differently

While previous research has shown that AI assistants can, indeed, boost radiologists’ diagnostic performance, these studies have looked at radiologists as a whole without accounting for variability from radiologist to radiologist.

In contrast, the new study looks at how individual clinician factors – area of specialty, years of practice, prior use of AI tools – come into play in human-AI collaboration.

The researchers examined how AI tools affected the performance of 140 radiologists on 15 X-ray diagnostic tasks – how reliably the radiologists were able to spot telltale features on an image and make an accurate diagnosis. The analysis involved 324 patient cases with 15 pathologies: abnormal conditions captured on X-rays of the chest.

To determine how AI affected doctors’ ability to spot and correctly identify problems, the researchers used advanced computational methods that captured the magnitude of change in performance when using AI and when not using it.

The effect of AI assistance was inconsistent and varied across radiologists, with the performance of some radiologists improving with AI and worsening in others.

AI tools influenced human performance unpredictably

AI’s effects on human radiologists’ performance varied in often surprising ways.

For instance, contrary to what the researchers expected, factors such how many years of experience a radiologist had, whether they specialised in thoracic, or chest, radiology, and whether they’d used AI readers before, did not reliably predict how an AI tool would affect a doctor’s performance.

Another finding that challenged the prevailing wisdom: Clinicians who had low performance at baseline did not benefit consistently from AI assistance. Some benefited more, some less, and some none at all. Overall, however, lower-performing radiologists at baseline had lower performance with or without AI. The same was true among radiologists who performed better at baseline. They performed consistently well, overall, with or without AI.

Then came a not-so-surprising finding: More accurate AI tools boosted radiologists’ performance, while poorly performing AI tools diminished the diagnostic accuracy of human clinicians.

While the analysis was not done in a way that allowed researchers to determine why this happened, the finding points to the importance of testing and validating AI tool performance before clinical deployment, the researchers said. Such pre-testing could ensure that inferior AI doesn’t interfere with human clinicians’ performance and, therefore, patient care.

What do these findings mean for the future of AI in the clinic?

The researchers cautioned that their findings do not provide an explanation for why and how AI tools seem to affect performance across human clinicians differently, but note that understanding why would be critical to ensuring that AI radiology tools augment human performance rather than hurt it.

To that end, the team noted, AI developers should work with physicians who use their tools to understand and define the precise factors that come into play in the human-AI interaction.

And, the researchers added, the radiologist-AI interaction should be tested in experimental settings that mimic real-world scenarios and reflect the actual patient population for which the tools are designed.

Apart from improving the accuracy of the AI tools, it’s also important to train radiologists to detect inaccurate AI predictions and to question an AI tool’s diagnostic call, the research team said. To achieve that, AI developers should ensure that they design AI models that can “explain” their decisions.

“Our research reveals the nuanced and complex nature of machine-human interaction,” said study co-senior author Nikhil Agarwal, professor of economics at MIT. “It highlights the need to understand the multitude of factors involved in this interplay and how they influence the ultimate diagnosis and care of patients.”

Source: Harvard Medical School

Portable Ultrasound Works Just as Well in Diagnosing Forearm Fractures in Kids

Photo by cottonbro studio

Portable ultrasound devices could provide an alternative to x-ray machines for diagnosing forearm fractures in children, which could alleviate waiting times for families in hospital emergency departments (ED).

Griffith University researchers Professor Robert Ware and Senior Lecturer Peter Snelling compared functional outcomes in children given an ultrasound and those who received an x-ray on a suspected distal forearm fracture. Dr Snelling said the ultrasounds were performed by nurses, physiotherapists and emergency physicians at four south-east Queensland hospitals.

“They treated 270 children, aged between five and 15 years, during the randomised trial, which included a check-up 28 days later and another check-in at eight weeks,” Dr Snelling said. “The findings show the majority of children had similar recoveries and returned to full physical function.”

Less than one-third of children who were given an ultrasound needed a follow-up x-ray and care at an orthopaedic clinic. Those who didn’t have a buckle fracture or fractured arm were discharged from hospital without the need for further imaging.

Professor Ware said children who had an ultrasound initially had fewer x-rays, and shorter stays in the ED. “Families were also more satisfied with the treatment they received,” he said. “The results are promising and have wider implications beyond in hospital diagnosis and follow up care.

“By using a bedside ultrasound, this frees up the x-ray machine for patients who really need it and can potentially be a cost-cutting measure for hospitals as they reduce the number of x-rays without comprising the safety of patients.

“It also would be extremely beneficial in rural or remote areas eliminating the need for children and their families to travel to a larger hospital for an x-ray.”

Source: Griffith University

X-Ray Images With Vastly Lower Radiation Doses

A new scintillation material developed by KAUST scientists can bring significant improvements to X-ray imaging in medicine, industry and security. Credit: KAUST

Scientists have successfully produced an exceptionally efficient, robust and flexible scintillation film to bring significant improvements in X-ray imaging, enabling much lower radiation doses to be used.

Scintillation materials release visible light, or “scintillate,” in response to absorbing  high-energy X-ray photons, enabling an image to be captured.

Researchers are continually exploring ways to make scintillation technology more sensitive, efficient and readily adaptable. The researchers, led by  Omar F Mohammed, Associate Professor of Chemical Sciences at King Abdullah University of Science and Technology (KAUST), sought to come up with an improved scintillation screen.

“Currently used materials suffer from several drawbacks, including complex and high-cost fabrication processes, radioluminescence afterglow and nontunable scintillation,” said Yang Zhou, a postdoc in Prof Mohammed’s lab.

Materials called lead halide perovskites have attracted considerable attention and shown significant promise. Novel perovskites are a category of materials that share the same crystal structure as the natural perovskite mineral calcium titanium oxide, but they include a variety of different atoms that replace all or some of those found in natural perovskite. 
To avoid toxicity problems and reduce cost, the researchers explored the use of elements besides lead. The newly developed screens are described in ACS Energy Letters.

The flexible scintillation screens the team developed can detect X-rays at ultralow levels, “approximately 113 times lower than a typical standard dose for X-ray medical imaging,” said Omar Mohammed, leader of the research group.

“Another vital advance is that the X-ray spatial resolution reported in this study is the highest achieved to date for powder-based screens,” said Dr Zhou.

“The physical flexibility of our films is also very important,” added Prof Mohammed. He explains that highly efficient flexible scintillation screens are urgently needed for using X-rays to better analyse awkward shapes.

The team plans to commercialise their advance, and to hope to refine their fabrication techniques.

Source: EurekAlert!

Discrepancies in Radiology Interpretation

Source: National Cancer Institute

Researchers who conducted an analysis of nearly six million acute examinations suggest that leaders in imaging practice consider efforts to match interpretation of subspecialty examinations with radiologists’ fellowship training in the acute community setting.

Pointing out that major and minor discrepancy rates were not higher for acute community setting examinations outside of interpreting radiologists’ fellowship training, “discrepancy rates increased for advanced examinations,” acknowledged lead investigators Suzanne Chong from Indiana University in Indianapolis and Tarek Hanna of Emory University. The study was published in the American Journal of Roentgenology.

Using the databank of a large US teleradiology company, Chong, Hanna, and colleagues performed an analysis of 5 883 980 acute examinations that were preliminarily interpreted by 269 teleradiologists with a fellowship of neuroradiology, abdominal radiology, or musculoskeletal radiology. When providing final interpretations, client on-site radiologists voluntarily submitted quality assurance (QA) requests if preliminary and final interpretations were discrepant; the teleradiology company’s own QA committee categorised discrepancies as major (n=8444) or minor (n=17 208).

Among initial teleradiology interpretations of acute community setting examinations, common examinations’ major and minor discrepancies rates were not significantly different when concordant versus discordant with radiologists’ fellowship training. However, advanced examinations’ discrepancy rates were higher when concordant with radiologists’ fellowship (relative risk = 1.45 and 1.17, respectively).

Noting that their findings support multispecialty radiologist practice in acute community settings, “efforts to match examination and interpreting radiologist subspecialty may not reduce diagnostic discrepancies,” the article authors cautioned.

A supplement to the published article is available here [PDF].

Source: American Roentgen Ray Society