[The following notes were generated by Douglas McKell MS, MSc and Rebecca Rowe, PhD]
The Fall 2023 IAMSE WAS Seminar Series, “Brains, Bots, and Beyond: Exploring AI’s Impact on Medical Education,” began on September 7, 2023, and concluded on October 5, 2023. Over these five sessions, we will cover topics ranging from the basics of AI to its use in teaching and learning essential biomedical science content.
The Co-Presenters for the 3rd session are Dr. Michael Paul Cary Jr. and Ms. Sophia Bessias. Dr. Cary is an Associate Professor and Elizabeth C. Clipp Term Chair of Nursing at the Duke University School of Nursing. Ms. Bessias is the evaluation lead for the Algorithm-Based Clinical Decision Support (ABCDS) Oversight program. She provides operational support and peer review for clinical decision support software proposed for use within the Duke University Health System (DUHS). Ms. Bessias holds master’s degrees in science and Analytics and Public Health from NC State University and the University of Copenhagen.
Dr. Cary listed four objectives of the session:
- Establishing Context and Recognizing Challenges
- Operationalizing Bias Mitigation through AI Governance
- Navigating the Terrain of Large Language Models (LLMs)
- Equipping Educators and AI-Driven Healthcare Technologies
The session was divided into four sections, each discussing one of the above Session Objectives.
Objective 1: Establishing Context and Recognizing Challenges
Dr. Cary began by sharing the context of the promises and perils of AI and Healthcare. AI in healthcare can revolutionize healthcare through the promise of:
- Improve patient care and clinician experience
- Reducing clinical burnout
- Operational efficiencies
- Reducing costs.
He then highlighted several potential perils that need to be taken into consideration, such as:
- Non-adoption or over-reliance on AI
- No impact on outcomes
- Technical malfunction
- Violation of government regulations
- Non-actionable or biased recommendations that could exacerbate health disparities
Dr. Cary posed a fundamental question: “Why is Identity Bias in algorithms so important?” He discussed a 2019 study by Obermeyer et al.al1 that demonstrated that a biased algorithm systematically assigned the same risk score to White patients and Black patients even though Black patients had 26.3% more chronic disease than White patients, which resulted in systematically excluding Black Patients from accessing needed care management services. The reason behind this was the algorithm assigned risk scores based on past healthcare spending, and Black patients tend to have lower spending than White patients for a given level of health. The error resulted from the developers using an incorrect label to predict a particular outcome, called Label Bias. Once the algorithm was corrected, the percentage of Black patients automatically enrolled in the care management program rose from 17.7% to 45.5%1.
Dr. Cary reviewed four elements of AI Government Regulations that are evolving. These include the 2022 FDA Final Guidance on Software as a Medical Device regulations that will regulate software and medical devices, including AI-powered devices. There is also the AI Bill of Rights, which aims to protect individuals from the potential harms of AI, such as Label bias and other biases and discrimination. There is also a lot of AI regulation going on at the State level by their Attorney Generals beginning to regulate AI in their states. In 2022, the Attorney General of California sent a letter to the CEOs of all the hospitals in CA asking for an account of the algorithms being used in their hospitals, what the potential bias could be, and what they plan to do to mitigate these biases. Finally, the Department of Health and Human Services (DHHS) announced a proposed rule of Section 1557 of the Patient Protection and Affordable Care Act (PPACA) that states Covered Entities (Health Care Systems and Providers) must not discriminate against any individual through the use of clinical algorithms in decision making and develop a plan to mitigate that possibility. Dr. Cary stated that while this is a huge step forward, the proposed rule needed to go further to specify what the covered entities need to do to reduce bias. Still, it did solicit comments on best practices and strategies that can be used to identify bias and minimize any discrimination resulting from using clinical algorithms.
Dr. Cary and his team determined that the covered entities referenced in Section 1557 of the PPACA would need to know how to examine their clinical algorithms to ensure they complied with the proposed rule. They conducted a Scoping Review of 109 articles to identify strategies that could be used to mitigate biases in clinical algorithms with a focus on racial and ethnic biases. They summarized a large number of mitigation approaches to inform health systems how to reduce bias arising from the use of algorithms in their decision-making. While Dr. Cary outlined the literature search, the study selection, and data extraction, he could not show or discuss the results of their review before its official publication. He noted that the Scoping Review results would be published in the October 2023 issue of Health Affairs at www.healthaffairs.org.
Dr. Cary then discussed some of the most pressing challenges facing the use of AI in healthcare. These include the lack of an “equity lens,” which results when AI algorithms are trained on biased or unrepresentative data sets. The result of this oversight is to exacerbate existing healthcare disparities, resulting in the AI decision-making system not providing equitable care.
The second challenge is the need for AI education and training of healthcare professionals and health professional educators. Very few of us have the necessary AI training, which results in a gap in knowledge and skills required to promote the successful integration of AI in healthcare. This leads to healthcare professionals struggling to understand the capabilities and limitations of AI tools, leading to a lack of trust, use, and improper use. Lastly, there is little to no governance in the design or use of data science and AI tools, which could lead to ethical and privacy concerns.
Objective 2: Operationalizing AI Governance Principles
Ms. Bessias began her presentation by sharing how Duke AI Health and the Duke Health System are attempting to overcome some of these challenges. In 2021, the Dean, Chancellor, and Board of Trustees charged the Duke Health Care System leadership to form a governance framework for any tool that could be used in patient care, specifically any algorithm that could affect patient care directly or indirectly. The outcome of this charge was the formation of The Algorithm-Based Clinical Decision Support (ABCDS) Oversight Committee. The ABCDS is a “people-processed technology” effort that provides governance, evaluation, and monitoring of all algorithms proposed for clinical care and operations at Duke Health. This committee comprises leaders from the health system, the school of medicine, clinical practitioners, regulatory affairs and ethics experts, equity experts, biostatisticians, and data scientists. It takes all of these perspectives working jointly to adequately assess the risks and benefits of using algorithms in health care.
The mission of the ABCDS Oversight Committee is to “guide algorithmic tools through their lifecycle by providing governance, evaluation, and monitoring.” There are two core functions of the ABCDS. The first step is registering all electronic algorithms that could impact patient care at Duke Health. The second step is to evaluate these algorithms as high, medium, or low risk. High-risk algorithms involve all data-derived decision-making tools, sometimes home-grown and sometimes from vendors. In either care, this process investigates how they were developed and how they are proposed to be used. Medium risk involves knowledge-based clinical consensus-based algorithms based on clinicians sharing their expertise to create a rubric. Lastly, there are low-risk algorithms that include the Medical Standard of Care that are well integrated into clinical practice and frequently endorsed by relevant clinical societies. The specific type of risk evaluation used varies depending on the details of any given use case.
Ms. Bessias then took us through a detailed review of the ABCDS Evaluation Framework, which consists of the different stages the algorithm must meet to proceed to the next stage. It is based on a software development life cycle process. There are four stages in the Evaluation process:
- Stage 1: Model development
- Stage 2: Silent evaluation
- Stage 3: Effectiveness evaluation
- Stage 4: General deployment.
Each one of these stages is separated by a formal Gates Review that evaluates each stage through a series of quality and ethical principles, including transparency and accountability, clinical value and safety, fairness and equity, usability, reliability and adoption, and regulatory compliance. The intention is to ensure that when the AI algorithms are deployed, patients see the maximum benefit and simultaneously limit any unintended harm. Quality and ethical principles are translated at each gate into specific evaluation criteria and requirements.
Duke’s AI Health goal is anticipating, preventing, and mitigating algorithmic harms. In 2023, they introduced a new bias mitigation tool to help development teams move from a more reactive mode to a more proactive and anticipatory mode of thinking about bias. One of their process’s most critical aspects was linking the algorithm with its implementation: ABCDS tool = Algorithm(s) + Implementation.
What was discovered is that bias can be introduced anywhere in the life cycle of the algorithm and needs to be considered during each stage. To better understand this, Duke AI Health focused on a publication by Suresh and Guttag in 2021.2 This study is known as a framework for understanding the sources of harm during the Machine Learning life cycle as it illustrates how bias can be introduced. The 7 types of Bias are Societal (historical), Label, Aggregation, Learning, Representation, Evaluation, and Human Use. They use a template to help people identify and address each type of bias. It is during data generation that historical, representation, and label biases are introduced. Ms. Bessias discussed three of these biases: Societal (due to training data shaped by present and historical inequities and their fundamental causes), Label (use of biased proxy target variable in place of the ideal prediction target), and Human Use (inconsistent user response to algorithm outputs for different subgroups) and gave an example of each one, as well as ways to address and mitigate them.
Objective 3: Navigating the Terrain of Large Language Models (LLMs)
Everyone is thinking about how to navigate the terrain of Generative AI in health care, especially large language models. Ms. Bessias then addressed how we can apply some of these tools and frameworks to LLMs. There are a large number of proposed applications of Generative AI in healthcare that range from low-risk to very high-risk. These include generating billing information, drafting administrative communications, automating clinical notes, EHR inbox responses, providing direct medical advice, mental health support, etc. There are some limitations and ethical considerations as well. For example, LLMs are trained to generate plausible results that may not necessarily be factual or accurate results. Explainability (how the algorithm produces an output) and Transparency (accessible communication about the sources of data creating outputs) is the second major consideration. This leads to an ethical consideration of what happens when an algorithm provides misleading or incorrect information. What options are available to address algorithmic harm, and who has this recourse? Another important question is about access versus impact when considering equity. How are the risks and benefits of Generative AI distributed in the population? An example of these considerations was discussed using Automated Clinical Notes as the AI application. Ms. Bessias stated there are many questions and few answers, but these are all the things that need to be considered as healthcare moves towards deploying some of these Generative AI technologies. To end this session, Ms. Bessias shared a reflection from Dr. Michael Pentina, who is the chief data scientist at Duke Health and the Vice Dean for Data Sciences at Duke University School of Medicine, in an Op-Ed that he wrote on how to handle generative AI:
“Ensure that AI technology serves humans rather than taking over their responsibilities or replacing them. No matter how good an AI is, at some level, humans must be in charge.”
Objective 4: Equipping Educators for AI-Driven Healthcare Technologies
Dr. Cary then discussed the 4th webinar objective about the competencies needed for health care professionals and health care educators as published by Russell et al. in 2023.3 The data for this publication was collected by interviewing 15 experts across the country, and they identified 6 competency domains:
- Basic Knowledge of AI – factors that influence the quality of data
- Social and Ethical Implications of AI – impact on Justice, Equity, and Ethics
- Workflow Analysis for AI-based Tools – impact on Workflow
- AI-enhanced Clinical Encounters – Safety, Accuracy of AI tools
- Evidence-based Evaluation of AI-based Tools – Analyze and adapt to changing roles
- Practice-Based Learning and Improvement Regarding AI-based Tools
By developing these competencies, healthcare professionals can ensure that AI Tools are used to improve the quality and safety of patient care.
For the past year, Dr. Cary and Duke University have partnered with North Carolina Central University, a historically black university with a deep understanding of the challenges faced by underrepresented, underserved communities. Through this partnership, they developed a proposed set of competencies for identifying and mitigating bias in clinical algorithms.
- Trainees should be able to explain what AI/ML algorithms are in the context of healthcare.
- Trainees should be able to explain how AI governance and legal frameworks can impact equity.
- Trainees will learn ways of detecting and mitigating bias in AI algorithms across the life cycle of these algorithms.
Dr. Cary ended the sessions by presenting the audience with several training opportunities and resources offered by Duke University. These include short courses and workshops, formal programs, and virtual seminar series shown in the fall and spring semesters open to anyone worldwide. In March 2024, Dr. Cary will present at the first-ever Duke University Symposium on Algorithmic Equity and Fairness in Health.
Lastly, Dr. Cary invited all webinar members to join Duke University in their commitment to advancing health equity and promoting responsible AI through a Call to Action for Transforming Healthcare Together.
References:
- Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019 Oct 25;366(6464):447-453.
- Suresh, H., & Guttag, J. (2021). A framework for understanding sources of harm throughout the machine learning life cycle. In Equity and access in algorithms, mechanisms, and optimization (pp. 1-9).
- Russell, Regina G. et al. Competencies for the Use of Artificial Intelligence-Based Tools by Health Care Professionals. Academic Medicine 98(3):p. 348-356.