[This transcript is auto-generated and unedited.]
- Can AI algorithms improve the clinical outcomes for sepsis? How will we integrate AI into clinical practice? And why is it imperative to have a regulatory framework to uphold the standard of care? I'm Dr. Kirsten Bibbins-Domingo, Editor-in-Chief of JAMA and the JAMA Network. This conversation is part of a series of videos and podcasts hosted by JAMA, in which we explore the issues surrounding the rapidly evolving intersection of artificial intelligence and medicine. Today, Dr. Suchi Saria and I discuss the ways in which AI can be used for diagnostic prediction and how it can improve healthcare outcomes. Dr. Saria is the John C. Malone Associate Professor of Computer Science at the Whiting School of Engineering and Associate Professor of Statistics and Health Policy at the Bloomberg School of Public Health at Johns Hopkins University. She directs the machine learning and healthcare lab and is the founding research director of the Malone Center for Engineering and Healthcare, both at Hopkins. Welcome, Dr. Saria.
- Thank you for having me.
- Wonderful, I hope we can do first names, if you don't mind.
- So you are a computer scientist, but you have really established yourself as someone thinking and innovating in healthcare. We work in these very different sectors, and although AI in healthcare seems like a natural, the skills to do AI and the skills in healthcare, people are often speaking different languages, they have very different ways of approaching problems. What's needed to actually optimize so that you have the best in the innovation and it's actually applicable in the real-world healthcare setting?
- One of the most tricky problems I encountered in the first several years that I've now come to truly appreciate is that engineers are very good at solving problems once they know what problem to solve. But the problem is we, as an engineer, they don't have deep understanding or training in the complexities of medicine, the data, or the problems, so it's hard to know what problems to solve. When I was getting started in 2008, '09, '10 in this area, there were very few clinicians who had deep appreciation for what problems, what AI could do, and very limited exposure to AI. So as a result, you couldn't walk into a clinician's office, describe an idea, and it would just make sense to them, right? 99 out of 100 clinicians would look at you with strange eyes and say, "What are you talking about?" And when I say clinicians, I mean clinical researchers. I mean practicing clinicians, I mean health services, researchers, the gamut. I think the ability to define and identify the right problems is very important and is at the core. If you can't even start with the right problem, you're not going to go anywhere in terms of a solution. But to define the right problems, you really need deep appreciation for the domain. Now, you could flip it and say, "Maybe if you go to clinical researchers and they understand it well enough to define the right problems, then engineers can come solve it." But what I've found is also if you don't have any appreciation for the technology, what it can do well and what it does poorly and when does it work and when does it not work, it's very hard to define the right problems. 'Cause some things are very hard to do and some things are easy to do, some things are very valuable to do, some things are not possible to do. So really, the biggest to be good and able to function at this intersection, you really have to kind of develop deep appreciation for two to three languages. There's the language of the problem domain, there's the language of the solution domain. And then more importantly, very recently in the last five to six years I've found, the language of implementation science and the operations of healthcare. 'Cause you could have perfect biology knowledge or disease knowledge, but if you don't understand how healthcare's practiced, you'll bring a solution in the world, but it may never be adopted because the people who are using your solution, there are incentive barriers, the way it's implemented is not practical, it's not usable, and you're not going to see an outcome or any improvement.
- So does that mean for you working with teams or is it literally that you're immersing yourself in different settings? Or you've just been doing this for so long and you know how?
- I think all of the above. I think the number one thing I have to learn was humility, right? Like, when you are an engineer, you think of the world in very logical ways and things have good solutions. You can prove things, you can explain, "This works, this doesn't work." And when you're entering such a murky area where there are so many different languages you have to synthesize, you have to be very patient, you have to figure out how to iterate, and learn, and be open-minded. And I learned to identify other people who are open-minded and partner with them, collaborate with them, iterate, iterate, iterate. So that's been one very big key part. I mean, almost many of the projects I do now, I probably have upwards of 100 collaborators in different areas, different clinical areas, clinical disciplines, trialists who design trials, people who on the regulatory side who think about evidence generation, people who do implementation, collaborators at different health systems with whom we are partnering to deploy solutions. So by virtue of being focused and having just worked in the discipline for so long, it's been really fun to collect lots and lots of experiences and amazing team members and collaborators along the way who've been very fun to learn with.
- Wonderful. So I've read your work in predictive AI and some published in JAMA Network Open on hypoglycemia. But one of the areas that I know you are very passionate about and have worked in for a while is sepsis. And when I talk with people and about what's the potential for AI to really improve a healthcare outcome, usually after some things about imaging, they tell me sepsis is the one where we could really make a difference. Why is sepsis the condition that people see the promise of AI to really change trajectories?
- Three-part answer to that. So the first thing is I really do think in sepsis, there's enormous, enormous potential. But I don't think sepsis is alone. I think there's a lot of other conditions that fall in the same template. Let's start with why sepsis. So first of all, when a patient has severe sepsis or septic shock, mortality rates are still around 30%. There's so much opportunity for impact. Second, we know that one of the most promising avenues for improving outcomes is early identification because we do have tools when we can identify early to do things differently. But the hard part is being able to identify sepsis early enough. Now, the landscape has changed a lot. Like, when I first started working in this, which was around 2012, sepsis was not a thing people understood AI had anything to do with. When we did that early work where we started to see how... In sepsis, there are a number of things that make it very suitable. So one, early detection; two, you already have so many routinely collected signals that you can leverage; three, the way sepsis presents itself varies across humans or across different patients, depending on their clinical context, depending on the history. And so essentially, this problem is very well-suited for AI because based on context, it can learn far more precise pattern, signals, markers compared to bringing human experts together. And they might come up with simple rules-based criteria that if the temperature is elevated or the WB, white blood cell count is elevated. But when we use those simple rules, there are lots of other conditions that mimic those. And so you end up getting confused whether it is really truly septic or not, or what's the risk of sepsis. But if you can leverage that rich clinical context, almost 200, 300, 400 variables that exist that you measure. On any given patient, you may only measure 12, 13 of them, and they vary by patients. But if you can start to model based on context what this patient has, you can start to learn far more precise markers than you can do otherwise. And that's what we showed in some of our early research in 2015, where we showed ML and AI may be very well-positioned to help us solve this problem of early identification broadly and sepsis in particular. And since then, we've shown the ability to generalize into more difficult populations. First, these were very large studies. So there were at five sites, academic and community hospitals. Most trials in AI have very much been with retrospective data as opposed to prospective implementations. Second, it was over 2 1/2-year period where we had to show generalization not just across sites, but also we went from no COVID to COVID, which was a surprise. And we had to show how the system performance generalized. The third, most studies talk about diagnostic performance, like sensitivity, specificity, early detection rates. When you're starting to put this in the real world, the question is: Will clinicians adopt? So one of these studies was focused entirely on real-world implementation and adoption, and we quantitatively and qualitatively measured adoption. And then the fourth was then showing we could show, one, dramatic improvements in early detection rates with very high sensitivity specificity. We could show early, we could show adoption, and now we did a pragmatic prospective study on outcomes, looking at mortality, morbidity, length of stay, and a host of other outcomes and showing improvements in these outcomes.
- Wonderful. So, sepsis, it's amazing to hear you've been working for 10 years, and sepsis is one of those areas that we know so much more about. But even just in the last month we had the CDC coming out with the guidelines for sepsis, we're anticipating another set of guidelines coming out. With each release of these guidelines, it's always striking how high the mortality still is. And so what I hear you saying is the need is great, lots of individual variation, and we already measure lots of data points and to have machines and helping us to sort through an understanding for an individual, whether there's a high likelihood of sepsis for early intervention is really the opportunity. You think about the scope of the science because one of the challenges we have with all new technologies in healthcare is whether doctors will actually adopt it or do something different if they've adopted it. And so it's really so important to see the outcomes. What do you think the big questions are remaining in the field of sepsis and the potential for AI?
- I think, to me, one of the big opportunities is really being able to take tools like this and scale it nationally. And furthermore, when you're deploying AI-based tools, you have to think a lot about performance in the real world, which means, just like we showed in our studies, drifts and shifts can happen, like new diseases or syndromes we hadn't seen before. And the ability to generalize or the ability to generalize because maybe there are drifts in how people are collecting data, what they're collecting because of policy changes. Maybe there was a CDC change that now is getting everybody to measure lactates twice in the ED, for example. And so, for me, I'm fascinated by the idea of operation analyzing AI in the real world. I see a lot of different ideas for tools that can be successful, but I think going from tools in the lab setting where you're taking on retrospective data and showing what's possible in the real world, implementing it in a practical clinical setting where these really are high quality, validated, trustworthy, easy to use, and really allowing you to influence care in a positive way is where I think a lot of new research needs to happen. I've been really passionate about it in the last five years. And in computer science, this happens often. Computer science as a field has reinvented itself every five years, right? So now it's federated learning, machine learning. So every five years, we're sort of going through this change of technology stack, which is pretty radical. And the ability then is not just to invent the idea, but to scale the idea. And what I've found fascinating is in research, very often we take then state-of-the-art research, we translate it to the real world in the form of spin-outs and companies that then scale it. And then in scaling it, you learn a lot about where the barriers are going to be. And that's what the nature of research becomes. Tackling those barriers, addressing those barriers, fixing those barriers till it becomes a thing that can become commonplace and everyday use. I think AI is ready for primetime in a lot of different areas. When we look at transitions of care, patient is in the hospital setting going home or going to the next side of care. How we handle that process today is there's so many leaky corridors, it's messy, it's terrible for patients, it's terrible for their families, it's difficult even for the care team and how inefficient it is. And we could do a lot to streamline care if we had better hold of the clinical data, the social determinants of health, a bigger fuller picture, and then use that with AI in real time to influence next step, next best step. So that's an example, but there are hundreds... Another area, pressure ulcers, again, leading cause of complications, huge opportunity. Today, we're in nursing shortages. Like, wouldn't it be great if we could streamline workflow in a way where we're focusing attention on things that really need to happen, which means let's identify those high-risk patients, let's make sure they get those preventative prophylactic bundles, which we have, and we're able to then provide the right care to them streamlined. And what we found in our work is that you can actually, almost in a 400, 500-bed hospital, reduce 12 to 15 FTE's worth of time, where you're now just giving back time to them to do more high-value work. And simultaneously, you can improve outcomes. JAMA now, in the AI call, publishes a huge number of papers on this topic. To me, in the next five years, it's going to be all about operationalizing it in a way that it improves frontline experience. It actually makes our job, the job for practicing clinicians easier, not harder.
- Well ,I like what you're saying because I think we focused, of course, and we all care about improving health outcomes for patients, but I think the opportunity to scale speaks to the ability to increase access, I think, and then the ability to offload some tasks from practicing clinicians so that they're able to focus for the highest and best use of their time and talents is also the opportunity. And you've given us some great examples. So one of the reasons everybody's talking about AI in medicine right now is, of course, the advent of generative AI. You've been mostly telling us about the possibilities and what you've already been showing with predictive AI. How do you see these two modes in which AI could help us in healthcare? Do they play together well? Are there ways in which generative AI makes prediction more challenging? Or how should I think about these two modes together?
- So the more honest, complicated answer is that as a researcher, I don't see the dichotomy as this A versus B. Instead, I see it as a continuum. And what I mean by the continuum is over the last decade, there's been a number of new foundational inventions in AI. So foundational inventions in the form of... The ability to train very large models, the ability to ground AI, the ability to create transparency and explainability, the ability to find strategies for mitigating bias, the ability to do multimodal reasoning by taking data of many different types, and modeling unstructured data. So I think of gen AI as sort of an easy thing that people can hold in their head in the form of a demo, like ChatGPT, that's very easy to grok, right? So it's this idea that you can predict what's the next word given historical context. And to do that, they train these very large models called transformer models using large amounts of data from the internet. So our ability to train very large models is another area where we've made a huge amount of progress and learning over time. So the ability to learn from human feedback. And by combining those capabilities, we have this beautiful, almost magical ChatGPT demo, that has completely captured people's imagination, which has been so fun to watch. It's a continuum. It's not really a dichotomy. What it is, is there are all these fundamental innovations. Think of it almost like a LEGO. You have a LEGO block, these are these new blocks that you've combined to build that ChatGPT demo. But you can unbundle, and reblock, and retool for a whole new type of application. So I think the opportunity in medicine that is ready are predictive AI applications where we reuse many of the same fundamental building blocks. So the ability to do multimodal reasoning, the ability to train from large data, the ability to have richer, bigger models, the ability to use bias mitigation, the ability to ground create explainability. Now, some of these blocks are much more valuable within the medicine context than it was in the ChatGPT context. ChatGPT hallucinates. Why? Because it was created as a fun conversational toy. The idea was not for it to be informative, so the architecture in its own right, is not set up for grounded conversations. But on the flip side, it's able to generate poetry because it's learned from reading millions of blobs and things on the internet that it can learn from. So I think it's beautiful. It's just fascinating that demos brought so many people's imagination to life of what AI can do. A common mistake I see people make is to think, "This is the box..." They want to use the box as is. They're trying to say, "Oh, gen AI is this chat bot. I've got to figure out..." Like, many health system leaders around the country are doing this where they're saying, "The board is asking them, what is the AI strategy?" And some, not all, but some are mistakenly thinking of it as, "What is my gen AI ChatGPT strategy?" And so it's not an AI strategy because they're not thinking of it as the fundamental building blocks. What are the problems of interest and high value within my organization? What problems can be solved and how will these building blocks help me solve them? Instead, they're thinking of it as, "What is my gen AI strategy?" Which is very much very narrow. And I think partly because of industry, like there's marketing... A lot of marketing very heavily focused on ChatGPT and gen AI, and then that's sort of getting people to think, "Oh, okay, what am I gonna do with gen AI?" But I think there's lots and lots of applications, clinical, administrative, operational, where we can bring AI to bear using all these fundamental building blocks. The important part is really understanding the problem, building solutions that actually solve that problem, and really focusing on measurement. Understanding: Are you rigorously measuring? Is it working, right? Which is different. In digital health, this idea of measurement is kind of, funnily enough, a little novel. In all other parts of medicine, we often measure things. In fact, we take it to a whole new level and we do RCTs. I think RCTs in digital health is very hard. For some interventions, it's easy. In other interventions, it's hard. But even using pragmatic studies, we can get very far. And I think there's need, when you're thinking of operational applications, to really focus on making sure: Is it working in your environment? How do you know it's working? Can you measure?
- That's exactly the types of studies I imagine we'd like to see is really those that are really measuring and showing us what the outcomes are. So you are a part of the National Academies Committee that's looking at the Code of Conduct. How do you think about why there should be guardrails or a Code of Conduct in this space?
- I think it's clear that AI can be applicable in many different areas, but in some sense, governance accelerates adoption, right? And accelerates responsible adoption. There's the National Academies AI Code of Conduct, which is a very multidisciplinary industry academia practitioners task force to create sort of a rubric at multiple levels of altitude for what good AI governance looks like. And here, the idea is not to come up with ideas from scratch because there are many, many, many groups that have individually created guidelines and rubrics. The purpose of the NAM AI Code of Conduct Task Force is to align. There are, probably at this point, upwards of 20 such blueprints, and guardrails, and rubrics. And from the perch of the NAM, there's an opportunity to almost reconcile, and align, and provide a little bit more of a clarity when there's so many places to look. I think going back to federal, one of the areas that I started getting really excited about was the opportunity for AI to be overseen by groups like the FDA, and in some sense, most think that when federal agencies get involved and regulators come in, it almost kills innovation. I think that is somewhat true. On the flip side, there's also an opportunity... If they can do it well, there's an opportunity to accelerate adoption because it builds trust, right? When a new person walks in with a new AI tool, how does a clinician know to trust it? And there's this natural question in the head of malpractice risk. And I think when you see regulated tools, the good news is there's an element of, "This is now accepted standard of care." So it mitigates concern and risk around malpractice, and so people are more willing to adopt it. There's also a stamp of approval that it's gone through rigorous validation. So I think that's where I see opportunity with federal agencies coming in and regulating. I've been really fortunate to work with the FDA directly as a researcher over the last three to four years on research pertaining to monitoring of these tools, assessment and evaluation of these tools in particular, like the kinds of issues you'd see in AI and predictive tools that you wouldn't see in traditional medical devices. And that's been really interesting to hear kind of how they view the world, where the gaps are in current framework, where there's opportunity, but they're also very willing to modernize. So last year in December, they came up with this... Now they have new guideline on what they call PCCP, Predetermined Change Control Plans. What that allows devices to do is essentially as part of the application, submit a plan that says, "I can update my device and improve my device. As long as it's within these parameters and it's been evaluated this way, I don't need to come back to go through another submission." That's huge because now suddenly, that allows AI-based software, predictive software, to go through the ability to really tune to the environment and learn and improve over time and still be within a framework where it's practical.
- Sure, sure. And it sounds, in this rapidly evolving technology, you need a governance structure that also evolves too to be responsive to the environment we're in. Well, it's great to talk with you and to hear your excitement about the potential for AI in healthcare, and also to hear what I hear you saying is the need for real data and measurement to understand what the effects are of these new technologies. This has been such a wonderful conversation. I hope you'll come back and tell us more about the next exciting phase for AI and medicine. Thank you so much for joining me today, Suchi.
- Well, thank you so much for having me. It's been such a pleasure and a real honor and very, very fun.
- Thank you for watching and listening. We welcome comments on this series. We also welcome submissions in response to JAMA's AI and medicine call for papers. Until next time, stay informed and stay inspired. We hope you'll join us for future episodes of the "AI and Clinical Practice Series," where we will continue to discuss the opportunities and challenges posed by AI. Subscribe to the JAMA Network YouTube channel and follow JAMA Network Podcasts wherever you get your podcasts.