Helping Doctors Make Better Decisions With Data: UC Berkeley’s Ziad Obermeyer

Topics

Artificial Intelligence and Business Strategy

The Artificial Intelligence and Business Strategy initiative explores the growing use of artificial intelligence in the business landscape. The exploration looks specifically at how AI is affecting the development and execution of strategy in organizations.

In collaboration with

BCG

Ziad Obermeyer, UC Berkeley

Dr. Ziad Obermeyer works at the intersection of machine learning and health. He is an associate professor and the Blue Cross of California Distinguished Professor at the University of California, Berkeley; a Chan Zuckerberg Biohub Investigator; and a faculty research fellow at the National Bureau of Economic Research. His papers have appeared in a wide range of journals, including Science, Nature Medicine, and The New England Journal of Medicine; his work on algorithmic bias is frequently cited in the public debate about artificial intelligence. He is a cofounder of Nightingale Open Science, a nonprofit that makes massive new medical imaging data sets available for research, and Dandelion, a platform for AI innovation in health. Obermeyer continues to practice emergency medicine in underserved communities.

Read more about our show and follow along with the series at https://dev03.mitsmr.io/aipodcast.

Subscribe to Me, Myself, and AI on Apple Podcasts, Spotify, or Google Podcasts.

Give your feedback in this two-question survey.

Transcript

Sam Ransbotham: Currently, machine learning researchers have to beg and plead for health care data. This scarcity fundamentally limits our progress. What could change when we get open, curated, interesting data? Find out on today’s episode.

Ziad Obermeyer: I’m Ziad Obermeyer from Berkeley, and you’re listening to Me, Myself, and AI.

Sam Ransbotham: Welcome to Me, Myself, and AI, a podcast on artificial intelligence in business. Each episode, we introduce you to someone innovating with AI. I’m Sam Ransbotham, professor of analytics at Boston College. I’m also the AI and business strategy guest editor at MIT Sloan Management Review.

Shervin Khodabandeh: And I’m Shervin Khodabandeh, senior partner with BCG and one of the leaders of our AI business. Together, MIT SMR and BCG have been researching and publishing on AI since 2017, interviewing hundreds of practitioners and surveying thousands of companies on what it takes to build and to deploy and scale AI capabilities and really transform the way organizations operate.

Sam Ransbotham: Today Shervin and I are thrilled to have Ziad Obermeyer joining us. Ziad, thanks for being here. Welcome.

Ziad Obermeyer: Thank you. It’s wonderful to be here.

Sam Ransbotham: I got to know Ziad at the NBER conference in Toronto, where he was talking about some of his health data platforms work, so maybe let’s start there. Ziad, tell us a little bit about what you’re doing, what this exciting platform is about. Tell us about Nightingale.

Ziad Obermeyer: Absolutely. I can tell you maybe a little bit about the backstory, which is that all of my research is in some way or another applying machine learning or artificial intelligence to health care data. And even though I say I do research in this area, actually what I spend a lot of my time on is pleading for access to the data that I need to do that research — pleading, wheeling and dealing, using all of the networks and contacts that I’ve accumulated over the years. And it’s still just incredibly hard and frustrating.

And so, given how much time I was spending on that, one of my coauthors — Sendhil Mullainathan, who’s at the University of Chicago — and I decided that we were probably not alone in this pain. And so a few years ago, thanks to the support from Schmidt Futures, Eric Schmidt’s foundation, we were able to launch a nonprofit called Nightingale. Nightingale Open Science — that’s its full name — is a nonprofit that uses philanthropic funding to build out interesting data sets in partnership with health systems.

We work with health systems to understand the things that are high-priority and interesting problems for them to work on, and we build data sets that take massive amounts of imaging — so chest X-rays, electrocardiogram waveforms, digital pathology, biopsy specimens — and we pair those images with interesting outcomes from the electronic health record and sometimes from Social Security data when we want mortality. And we create data sets that are aimed at answering some of the most interesting and important questions in health and medicine today: Why do some cancers spread and other cancers don’t? Why do some people get a runny nose from COVID and other people end up in the ICU? So all of these questions are areas where machine learning can really help — not just help doctors make better decisions, but help drive forward some of the science. But those data sets are in very short supply.

We create those data sets with health systems, and then we de-identify them and we put them on our cloud platform, where we make them available to researchers around the world for free. And I think our inspiration for a lot of that work was the enormous progress in other areas of machine learning, driven by the availability of not just data sets but open, curated, interesting data sets that take aim at important problems and that are made available for people who want to drive performance forward on some of those tasks. That’s one of the health data platforms that I’ve been working on for the past few years.

Sam Ransbotham: So give us some examples of that. What are some analogies? What are the other platforms you’re referring to?

Ziad Obermeyer: I think the most famous one of these is called ImageNet. This was put together a number of years ago by essentially getting a bunch of images from the internet and then getting people to caption those images. So, you know, we get a photo. It’s people playing Frisbee on the beach. And then, once we’ve got millions and millions of those images, we can train algorithms that map from the collection of pixels in that image to the caption that a human would assign that image.

There are many data sets like that: There’s a handwriting-recognition data set; there’s a facial-recognition data set. And those data sets, as we’ve seen time and time again, have just been instrumental in driving forward progress in machine learning. So people form teams. They collaborate; they compete with each other, all trying to do the best at these tasks. And that’s just been a huge engine of growth that, along with computational power and the hardware side, has really pushed the innovation forward on the software side.

Shervin Khodabandeh: This is really fascinating. Maybe we take a few steps back. Ziad, tell us a bit about your research and what you’re aiming to do with machine learning [and] data. You’re a physician by training. You also are a scientist and associate professor. So maybe give us a little bit about your background, how you ended up where you are — this amazing unicorn of multiple disciplines and skill sets.

Ziad Obermeyer: That’s a very kind description. Thank you, Shervin. I studied history in university, and then I did a master’s in history and philosophy. And I was really interested in science and studying how science got made: how new fields formed, how scientists kind of sorted into these factions and knowledge was socially constructed.

After a brief stint in management consulting, I went to medical school, and I was actually a research assistant for Chris Murray, who ran the Global Burden of Disease project and still does really, really amazing work in that world of quantifying the burden of disease globally. I learned a lot of how to do research from Chris. And I think that’s a recurring theme — that because I’m extremely lucky and privileged, I managed to spend time around some really, really smart people who invested a lot of time and effort in teaching me stuff.

I trained in emergency medicine. Being in the ER is … a fascinating and interesting and stressful experience, because you’re constantly faced with the limits of your own ability to think and understand problems. So I did medical school and residency, and then I started practicing.

When I started practicing was when I started seeing all of these things about medicine that were so difficult, and that would, you know, literally keep me up at night. Like, I’d go home after a shift and I’d just lie in bed, and I’d be super stressed about this one patient that I’d sent home, because I’d remember something about her or there was the test I should have ordered and I didn’t order it. So I turned a lot of that stress into research, which — I don’t know what Dr. Freud would say about it, but there’s probably some issues I’ll need to explore later with my therapist.

I think that no matter how good you are at that job, if you’re paying attention, you’re always making mistakes. I think that’s an experience that almost every doctor has. And I think the thing that I realized at some point was that the kinds of mistakes that doctors are most likely to make are the kinds of problems that machine learning would be really, really good at. So one of the hardest things that doctors need to do — really, a fundamental activity in medicine — is diagnosis.

So, what is diagnosis? Diagnosis is looking at a patient, looking at all of the test results, the X-rays, the laboratory data, how they look, all these things, and distilling that down to a single variable. Like, does this person have pneumonia? Do they have congestive heart failure or something like that? So mapping this very high-dimensional data set at the patient level to a probability of having disease one, disease two, disease three — [it’s a] great machine learning task.

So medicine is just full of these problems that (a) doctors do very poorly on many different measures of performance and (b) if algorithms are built thoughtfully and carefully around those problems, they could really improve the quality of decision-making. And so that’s the genesis of my entire research program: building algorithms that (a) help doctors make better decisions and (b), optimistically, also try to push forward the science underlying some of those decisions around who has sudden cardiac death, who develops complications from COVID, who develops metastatic cancer and who doesn’t.

Shervin Khodabandeh: I think another thing that possibly exists in medicine versus other fields where humans and machines work together to make machines better and make humans better is probably the fact that the expert’s opinion that’s correcting the machine [has] probably gone through a lot more diligence, because there are playbooks and guidelines and, by virtue of you becoming a physician and licensed and having board certification, it’s unlikely that the level of disagreement or variance between experts in medicine would be more so than, let’s say, in the field of marketing or, you know, credit underwriting, or … I would assume that that makes the training part of the algorithm a bit more standardized or less subject to one expert’s opinion.

Ziad Obermeyer: It’s a really, really interesting set of questions. I knew a lot about medicine. I knew some basic things about how to do research. But I started working with a health economist at Harvard, David Cutler, and then with one of his colleagues, Sendhil Mullainathan, and that’s where I started investing a lot of time in learning some of the technical skills that, in combination with those clinical skills and clinical knowledge, was the basis for the research that I’m doing today.

Sometimes we try to solve this problem by saying, “Well, we’re not just going to have one radiologist. We’re going to have five radiologists. And then we’re going to take the majority vote.” But are we really practicing medicine by majority vote? So it’s one of these interesting places where doing machine learning in medicine is very, very different from other areas, because we fundamentally have a more complicated relationship with the ground truth. And human opinion, as trained as these experts are and as much practice as they’ve gotten over years of residency and training — we can’t consider that the truth.

I’ll tell you about one paper that we wrote a few years ago. This was led by my colleague Emma Pierson, who’s a computer scientist at Cornell. And what we showed is that radiologists systematically miss things on X-rays — in this case, knee X-rays — that disproportionately caused pain in Black patients. So when you go back to the history of what we know about arthritis in medicine, a lot of the original studies that generated the scoring systems that doctors still use today were developed on coal miners in Lancashire, England, in the 1940s and ’50s. So it’s not at all surprising that knowledge built up in that very specific time and place would not necessarily map onto the populations that doctors see in their offices today in England or in the U.S.

And the way we showed that was actually by training an algorithm not to do the thing that most people would have trained an algorithm on, which is to go from the X-ray to what the radiologist would have said about the X-ray. If we train an algorithm that just encodes the radiologist’s knowledge in an algorithm, we’re going to encode all of the errors and biases that that radiologist has. So, what we did instead is we trained an algorithm to predict not what the radiologist said about the knee but what the patient said about the knee.

So we trained the algorithm to basically predict, is this knee a painful knee or not? And that’s how we designed an algorithm that could expose that bias in — not the radiologist so much as in medical knowledge. And it really provided a path forward … that algorithms, even though a lot of my work has shown that they can reinforce and even scale up racial biases and other kinds of biases in medicine, there’s also this pathway by which they can do things that humans can’t. They can find signal in these complex images and waveforms that humans miss, and they can also be forces for justice and equity, just as easily as they can be forces that reinforce all of the ugly things about our health care system and society.

Shervin Khodabandeh: The problem of training is actually much harder because you don’t have ground truth. And you’re not only exposing … or you’re not only trying to correct the models’ biases or inaccuracies but also the physicians’ as part of the training.

Ziad Obermeyer: Yeah, perfectly put. The challenge isn’t just building the algorithm. It’s not just the same challenge we have in any other field. The fundamental challenge in health is creating the data set that speaks to that ground truth. The good news is that thanks to the enormous success of electronic health records, the ability to link data sets from hospitals to state Social Security data and other interesting sources of truth from elsewhere, there is a wealth of information that lets us stitch together and triangulate that ground truth. But it’s a very difficult clinical problem, not just a machine learning problem. I think that’s one of the big reasons, besides the lack of data.

The other reason that we haven’t seen machine learning transform the practice of medicine in the same way that it’s transformed other industries is because these problems require a certain bilingual skill set. You need to understand how to do useful things with data. But you also need to really understand the clinical medicine side of these problems to be effective because you can’t just swap in the radiologist’s judgment for the judgment of whether there’s a cat or not in this image. It’s a much, much harder problem.

Sam Ransbotham: That bilingual thing seems really tough, though. And it makes me think about your background. You’re obviously in a position where you’ve ended up with both of these languages. I’m also curious, do we have to have that?

Ziad Obermeyer: Without minimizing the difficulty of this problem, let me point to an example where something like this has worked really well, which is a little bit more in your world, Sam, than mine, but behavioral economics, I think, is a really, really great example of a fundamentally new field that requires exactly the same kind of bilingualism. And so, what you needed for behavioral economics as a field was, you needed, first of all, economics to start taking human behavior seriously, beyond just a simple function of incentives. But you also needed psychologists to make pretty big investments in learning the technical basis for demonstrating what is a bias and what’s an error, what’s not an error, things like that. And so I think there’s a really nice analogy to this world, where the doctors are playing the role of the psychologists and the computer scientists, but also the economists are playing the role of the economists in the other world.

I think that one of the reasons that these things are hard — to echo some reasons for pessimism — is that we’re not very good inside of academia, despite everyone saying how interesting and wonderful multidisciplinary work is. None of the incentives are really set up to promote that kind of work. And so if you’re a computer scientist and you need to get your paper into some conference proceedings, whether you do a great, A+ job in getting to a ground truth label or whether you do a really bad job doesn’t really matter for your probability of getting that paper into your favorite conference proceedings. And I think if you’re a doctor, whether you’re making big investments in machine learning or not is not really going to affect your likelihood of getting that NIH grant that you’re applying for.

And so, to go back to one of the reasons Sendhil and I started Nightingale, I think we need these kinds of institutions that help build that community of people who are taking these kinds of things seriously. So if you’re a Ph.D. student, you need data to work in the field. I have a lot of Ph.D. students at Berkeley who come to me and say, “Oh, I’d really love to apply the thing that I’m good at in machine learning to health.” And I say, “Great, we’ll add you onto the Data Use Agreement, and then you’ll have to do all of the training, and then we’ll amend the IRB [institutional review board proposal]. …” And by the time that’s done, they’ve already got a job at Facebook or wherever, and it’s all over. And so I think that building up these public goods in this area is a really good place to start to build the community of people who can do the work and be a set of collaborators and peer reviewers and things like that. But it’s a process.

Shervin Khodabandeh: The other thing that your comments lit up in my head is the possibility that the experiment design that we would do today, to do a diagnosis of whatever it is, given that data is so much more abundant now than it was maybe 50, 30, 20 years ago, where data was really scarce, I wonder if that actually changes even the guidelines for how we think about a positive diagnosis, because some of these correlations that you’re talking about may not have even been possible, so maybe nobody even thought about, “You for sure have this disease if these two things happen.” Well, maybe there’s a third thing that would happen two months later that you don’t even know would happen, because nobody collected data on it or nobody could correlate the data.

Ziad Obermeyer: Yeah, great point, and I think it really highlights one of the huge advantages of doing this work today, when we have longitudinal data from electronic health records that’s linkable to a lot of other data from a lot of other different places. The value of the data, as it expands in scale and scope and linkages, just increases exponentially. And exactly as you said, it opens up a ton of new possibilities to learn that were not afforded to us previously.

Sam Ransbotham: You set us up with a Hobson’s choice here, like, “OK, we can either wait three days for the culture test, or someone has to make a call right now.” And this is pointing out just how much better we can measure all sorts of things and maybe measure things that we weren’t even thinking about measuring now … that might tell us what was going to happen in three days, and we don’t … like you say, I think we’re still very early in that process.

You mentioned some of the things, like negative news, that may poison the well. And when I think about that … by analogy, I just taught this week in class, the [module on] handwriting recognition. And in class, I’m able to take a group of students, and we’re able to perform with algorithms what would have won contests 15 years ago. And we can do that in class on laptops.

Well, by analogy, with these data sets you’re putting together right now, what are those sorts of wins that we can expect? I mean, the way to offset the poisoned well is the miracle cure. I don’t want to get too snake-oily here, but what kinds of things can we hope for? What kind of successes are you seeing so far with making these data sets available?

Ziad Obermeyer: I can tell you a little bit … when I think about the output for an organization like Nightingale Open Science, I think the output there is knowledge and papers and computational methods that are developed on these data, but I think there’s also another way to think about what the output is.

I can tell you a little bit about another platform that I’ve been working on, which is called Dandelion Health. Dandelion is a for-profit company, and what that company does is, we first have agreements with a handful of very large health systems across the U.S., and through those agreements, we get access to all of their data. And when I say all of their data, I really mean all of their data. So not just the structured electronic health records, but also the electrocardiogram waveforms, the in-patient monitoring data when someone’s in the hospital, the digital pathology, the sleep monitoring. Everything.

This company is designed to help solve that bottleneck and help people get these products into the clinic faster. And the way … you know, I grappled with this a lot, and I think the way I think about it is that there are clearly downsides to using health data for product development, and I think that there are real risks to privacy and a lot of things that people care about. And I think those risks are real, and they’re very salient to us. There’s another set of risks that are just as real but a lot less salient around not using data.

I think there are also a number of applications to what people think of as life sciences and to clinical trials. There’s a whole set of conditions today, something like Alzheimer’s, and we [recently] saw some sad news from yet another promising Alzheimer’s drug. It’s been pretty sad news for decades in this area. And one of the reasons is this weird fact that I hadn’t thought of until I started seeing some of these applications, which is that if you want to run a trial for a drug for Alzheimer’s, you have to enroll people who have Alzheimer’s. But that means the only drugs that you can develop are the ones that … they basically have to reverse the course of a disease that’s already set in.

So now imagine you had an Alzheimer’s predictor, that with some lead time could find people who are at high risk of developing Alzheimer’s but don’t yet have it. Now you can take those people and enroll them in a clinical trial. And now you can test a whole new kind of drug, a drug that could prevent that disease, instead of having to reverse it or slow it down. So, that’s, I think, really, really exciting too.

Sam Ransbotham: We’re probably getting close on time. Shervin, are you the five-question person or am I today?

Shervin Khodabandeh: I can do it.

Sam Ransbotham: We’ll explain this, Ziad. It’s not as onerous as it sounds. We have a standard way of closing out the episodes.

Shervin Khodabandeh: So, Ziad, we have a segment where we will ask you a series of rapid-fire questions, and you just tell us whatever comes to your mind.

Ziad Obermeyer: OK.

Shervin Khodabandeh: What’s your proudest AI or machine learning moment?

Ziad Obermeyer: I’m working on a paper right now that is in collaboration with a cardiologist in Sweden, where we’re linking all of the electrocardiogram waveforms that were ever done in that region with death certificates. And we’ve developed an algorithm that can actually forecast with a surprising degree of accuracy who is going to drop dead from sudden cardiac death in the year after that ECG. And I think, in addition to being super interesting scientifically, this opens up this huge, huge social value of being able to find people before they drop dead so that you can investigate them and even put in a defibrillator that could prevent this catastrophic thing that happens hundreds of thousands of times every year and that doctors don’t understand and can’t predict.

Shervin Khodabandeh: That’s the kind of thing to be proud of. Wow. What worries you about AI?

Ziad Obermeyer: I think the work that I’ve done on algorithmic bias has made me update negatively on how much harm these algorithms can do. We studied one algorithm that is unfortunately probably still deployed in many of the biggest health systems in the country. By the company that makes its estimates, it’s being used for 70 million people every year to screen them and give them access or not to extra help with their health. And I think that those kinds of products, these are not theoretical risks. These are real products that are actually deployed in the health care system, affecting decision-making every day. They’re doing an enormous amount of harm. In the long term, I worry that those kinds of things are going to cause reactions that would be entirely justified in shutting down a range of things that that could ultimately be very positive.

Shervin Khodabandeh: Your favorite activity that involves no technology?

Ziad Obermeyer: No technology? Um, I’m going to interpret that liberally and assume that a surfboard does not involve technology, even though it takes a lot of …

Shervin Khodabandeh: I realize, to an academic, that’s an extremely ill-posed question.

Ziad Obermeyer: I really like skiing, and my wife, despite being from Sweden, does not like snow or anything to do with snow. And so our compromise was learning how to surf together, and that’s become really one of my favorite things to do. And one of the things I like about it, even over skiing, is how little technology there is. There’s no lift. There’s no boots. There’s no, like, all these things that you need for skiing. Surfing, you just need one plank, and then you just get out there. And it’s wonderful.

Shervin Khodabandeh: Yes, and thanks for challenging that question. I think we need to rephrase that question. What’s the first career you wanted? What did you want to be when you grew up?

Ziad Obermeyer: When I was in grade school and high school was when there was this enormous explosion of interest and optimism around human genetics, and I was just fascinated by that and by biology. And in retrospect, I’m very glad that I didn’t do that, because I think that the stuff that I’m working on now — I’m going to make a prediction that many people will disagree with — but I think that machine learning applied to data has nothing to do with genetics. Like ECGs and images, it’s going to have a far larger impact on health and medicine far sooner than human genetics.

Shervin Khodabandeh: And finally, what’s your greatest wish for AI in the future?

Ziad Obermeyer: When I look around today at the uses to which AI is being put, I think the proportion of things that are generating large amounts of social value is unfortunately fairly small. I think there’s a lot of ad-click optimization, and I don’t mind that. I mean, I benefit a lot from it. I’m not someone who opts out of all those things; I want personalized ads. I buy a lot of things that are targeted to me on Instagram. I think it’s great. I’m not … no critique of ad personalization, but the opportunity costs of ad-click personalization, given the talent and technology and money that’s being put into it, I think is large relative to other things, like health and medicine and these other areas where AI has huge potential to improve society as a whole. I hope that in 10 or 20 years, there’s going to be a much higher proportion of people working on those kinds of questions than there are today.

Shervin Khodabandeh: Thank you for that.

Sam Ransbotham: You made a point here that really resonated with me, and that’s the opportunity cost of not doing things with health care data. And as you were talking about some of the negatives of health care, I was kind of shaking my head no, that I think we have so much opportunity out there to … will I trade some of my data for 20 more healthy years, 30 more healthy years? Sign me up. And so I’m hoping that maybe some of our listeners — that resonates with [them]. Thank you for bringing up some of these things and raising awareness about a fascinating set of initiatives that you’re just all over the place with. I think we called you bilingual, trilingual — I’m not sure how many linguals we can go up to. But thank you so much for taking the time to talk with us today.

Ziad Obermeyer: It was such a pleasure to talk to both of you.

Sam Ransbotham: Thanks for listening. Next time, Shervin and I talk with Eric Boyd, AI platform lead at Microsoft. Talk to you then.

Allison Ryder: Thanks for listening to Me, Myself, and AI. We believe, like you, that the conversation about AI implementation doesn’t start and stop with this podcast. That’s why we’ve created a group on LinkedIn specifically for listeners like you. It’s called AI for Leaders, and if you join us, you can chat with show creators and hosts, ask your own questions, share your insights, and gain access to valuable resources about AI implementation from MIT SMR and BCG. You can access it by visiting mitsmr.com/AIforLeaders. We’ll put that link in the show notes, and we hope to see you there.

Topics

Artificial Intelligence and Business Strategy

In collaboration with

BCG

About the Hosts

Sam Ransbotham (@ransbotham) is a professor in the information systems department at the Carroll School of Management at Boston College, as well as guest editor for MIT Sloan Management Review’s Artificial Intelligence and Business Strategy Big Ideas initiative. Shervin Khodabandeh is a senior partner and managing director at BCG and the coleader of BCG GAMMA (BCG’s AI practice) in North America. He can be contacted at shervin@bcg.com.

Me, Myself, and AI is a collaborative podcast from MIT Sloan Management Review and Boston Consulting Group and is hosted by Sam Ransbotham and Shervin Khodabandeh. Our engineer is David Lishansky, and the coordinating producers are Allison Ryder and Sophie Rüdinger.

Tags:

Topics

Artificial Intelligence and Business Strategy

Transcript

Topics

Artificial Intelligence and Business Strategy

About the Hosts

Tags:

More Like This

Add a comment Cancel reply

Subscribe to Me, Myself, and AI