Conversations on Applied AI

Nick Heller - Machine Learning for Medical Image Analysis

January 24, 2021 Justin Grammens Season 1 Episode 12
Conversations on Applied AI
Nick Heller - Machine Learning for Medical Image Analysis
Show Notes Transcript

Super excited the share the conversation I had with Nicholas Heller. Nick is PhD student in the Department of Computer Science and Engineering at the University of Minnesota -- Twin Cities.

Nick's research focuses on the development and validation of clinical prediction models for risk stratification and treatment planning in genitourinary cancer, especially renal cell carcinoma. In particular, he's interested in the use of deep learning to incorporate tumor appearance into prediction models in more expressive and objective ways while maintaining transparency and biological plausibility.

Nick shares some really good books that I have listed in the notes below. Please reach out to Nick on his personal UMN page.

If you are interested in learning about how AI is being applied across multiple industries, be sure to join us at a future Applied AI Monthly meetup and help support us so we can make future Emerging Technologies North non-profit events! 

Resources and Topics Mentioned in this Episode

Enjoy!
Your host,
Justin Grammens

Nicholas Heller  0:01  
Because I actually wouldn't be theoretical physics. And I started in observational lab, which is more on the applied side. But it's still like you said, you know, you're trying to sort of learn about the world, which is really cool. But then also, I don't know, when you get to college, you see all these problems, like I want to get my hands dirty and really solve something that has a big impact right away. Right? Maybe that just shows I'm impatient. But

AI Announcer  0:23  
Welcome to the conversations on applied AI podcast where Justin Grammens and the team at emerging technologies North talk with experts in the fields of artificial intelligence and deep learning. In each episode, we cut through the hype and dive into how these technologies are being applied to real world problems today. We hope that you find this episode educational and applicable to your industry and connect with us to learn more about our organization at applied ai.mn. Enjoy.

Justin Grammens  0:53  
Welcome to the conversations on applied AI podcast. Today in the program, we have Nick Heller, Nick is a PhD student in computer science and engineering. His research area is machine learning for medical image analysis. In particular, he studies how image analysis can help with diagnosis and treatment planning for renal tumors. He has a ton of relevant experience in computer vision and machine learning, especially applied to cross sectional medical imaging, and working with various deep learning frameworks, primarily TensorFlow and pytorch. So thanks so much for being on the podcast, Nick.


Nicholas Heller 1:30
Yeah, happy to be here. Oh, cool. So I gave it a little bit of intro about sort of where you are today, I'd be curious to get a little bit of background on yourself, maybe how you got to where you are, and sort of how you got into technology, and specifically, maybe studying AI? Sure. So I came to college actually wanting to do physics. And I started work pretty early on in a observational cosmology lab. And they were doing some cool imaging of the sky, they had this balloon that flew around Antarctica, you know, a telescope on it. And I didn't know much about the science at the time, but I didn't know how to program. And so they put me on data analysis and visualization, basically. And I fell in love with that and ended up switching my major to computer science. And I switched labs started working in this robotics lab, but Nico's who's my advisor now, and I was working on some computer vision stuff for him and sort of read as I was getting to senior year and thinking about graduate programs, I started working with this guy, Panos, who was running an NIH grant to do some medical imaging stuff. And I helped him a little bit, write that up, and we submitted it and the lab sort of at that time had gotten in and out with medical imaging that occasionally they would have somebody that was working on it, but not really full time, and not the primary focus. But the grant got funded. And I had, you know, kind of a few options for programs to go into, but I really wanted to see that project through. So I stayed on and started my PhD here at Minnesota as well. And yeah, I've been working on it ever since. That's awesome. So how far are you into it? How much you have left? I'm just starting my fourth year. And the thing about PhD programs is no one ever knows when they're gonna be done. Right. I'm hoping you know, something like two years left, but you never really know. Sure. Sure. You know, you mentioned getting into medical imaging. I mean, do you think that's indicative of Minnesota? I guess some of having all these medical companies around us? Do you feel like this University Minnesota in particular is a really hot space for that? Yeah, I think it is, in a way, we definitely have a lot of companies in the area. And so there's a lot of interest, we have a great biomedical engineering department. And in the med school, we have a lot of people that are really willing to collaborate. It's it's awesome at Minnesota, because the medical school is right across the street from the computer science building. So we really just have to walk across the road, and we can have meetings together. There's some schools where the medical school is like in a different city, right? So be harder at a school like that. Sure, sure. And do you find yourself then working with a lot of doctors and maybe you're focused in on one or two, as you're, as you're going through this program? How is that structured, is there's not really any specific structure to it. But I've really gravitated to the medical side, and really just trying to get my hands dirty, and really understand the problems well, like a lot of times, I'll shadow doctors and alar in the clinic just to understand better what they're doing and what the problems are that we can work on. And they're so busy, you know that they don't really have time to come over and meet with us on our side of the street. So we always go to them. Sure. Yeah, I liked how you brought up physics as you were sort of talking about it's your initial undergrad Me Myself, I majored in math and physics, but I think I really was drawn to the applied side. You know, I felt like, I joke with people that sort of solving for x isn't really interesting to me. But if solving for x to calculate velocity or something like that is a lot more cool. So that's kind of what sort of drew me into more the the application side it feels, it seems like maybe you're into that as well, you know, technology for the sake of tech or learning something. It doesn't really scratch the itch, you actually want to apply it. Yeah, that was a big reason, because I actually wanted to do theoretical physics. And I started in observational lab, which is more on the applied side, but it's still like you said, you know, you're trying to sort of learn about the world, which is really cool. But then also, I don't know, when you get to college, you see all these problems. You're like, I want to get my hands dirty and really solve something that has a big impact right away, right? Yep, maybe that just shows I'm impatient, but I don't know.

Nicholas Heller  4:59  
For sure.

Justin Grammens  5:00  
Sure, well, you and me both for sure. You know, one of the things I asked some of the people that come on the show is sort of how would you define AI? It's such a big and broad topic. I've been involved a lot of the internet of things over the past decade or so doing a lot stuff, and IoT, and IoT is just got this term, and every means something different to everybody. And so, you know, I don't know, if you have a short elevator description, or sort of, you know, when somebody says, what do you what do you do on a day to day basis? Instead of saying, you know, I do artificial intelligence? What do you what do you tell them? Or how do you sort of describe what you do?

Nicholas Heller 5:34
Yeah, I guess if I had to say a definition for AI, I like the it's like a textbook definition, right? That it's like automated rationality, right? You have some measure of being rational, which you have to like, define very precisely, right? Being rational in this context means classifying images correctly, or means, you know, acting in a way that replicates human speech or like language models. So you you have some way to measure this is right, this is what we expect in this situation. And then AI is for the engineering that goes behind producing that automatically. But yeah, I mean, I guess if someone's in an elevator and says, what do you do I just say, I work in medical imaging,


Justin Grammens 6:07
So what are some cool things, I guess, that are going on in the field of medical imaging, you spoke at our last applied AI meetup, specifically on, you know, medical image analysis. And you talked a lot about, while you talked, you talked a lot, hey, basically, it's all the really cool things are going on. And for people that maybe missed the meeting, you know, you had a lot of different tools and techniques and ways that you're using AI, maybe you could describe what a day in the life is of what you're doing right now, today that would that would touch on some of that stuff. And also some of the unique ways you're using AI in the field.


Nicholas Heller 6:15
So there's so many opportunities right now for AI in medical imaging, I mean, just in terms of classification, which is like a very narrow part of AI, you can apply that to a ton of problems in medical imaging, whether it's chest x rays, or CT scans, they're trying to classify like either renal tumors, you could classify what what subtype they are, what grade they are, what stage they are, there's so many certain problem formulations that you can come up with that would have like a really immediate impact. If you didn't, well, what I work on most of the time is segmentation, which is where you're trying to take every voxel in the image or like a pixel, but in 3d, and you're trying to classify each one of those as belonging to let's say, tumor or kidney or background. And and so if you've done that, well, you have a really awesome quantitative geometric description of the tumor and the surrounding anatomy. And from that you can measure all sorts of things, you can sort of come up with texture values of sort of how rough does this tumor look, you can look at the border between the tumor and the kidney and see, okay, how sharp is the boundary there? What does that correlate with? So just getting those sort of really objective and precise measurements, I think is a, at least a good bridge into applying AI in medicine. But yeah, I mean, the potential for for doing great things goes so far beyond that. Sure. So you call this area segmentation. Is that is that correct? Yeah. So there's there's sort of two versions of segmentation that people usually talk about, there's semantic segmentation, which is where you're just classifying every voxel. And that's usually what I do. And then there's also instance segmentation, which is where you're sort of first doing detection of all the things that you care about in the image. So say there's like more than one tumor or brother one kidney, you first detect those. And then within each of those detections, you do segmentation. So if you are doing instance, segmentation of people, versus semantic segmentation, and you had a big crowd, for instance, segmentation, semantic segmentation would give you one big sort of blob of people, but instance segmentation, which could give you individual small blobs of this person, this person, this person, each with their own sort of boundaries. Yeah. So if somebody were to take, I think you had a sort of like a picture of a road during the meetup, and you know, you're able to identify, oh, this is a person, this is a car, this is a streetlight, all that type of stuff, in that case, gets kind of what you're saying is, is basically to be a group of people that'd be sitting there, which is the first kind, I guess, of segmentation. And then the specifics, it would be, hey, I want to pick up picked out that specific person in that crowd. And I guess it would be would it be able to, to then decide, oh, this is a kid versus a woman versus a man versus something, you know, something like that will correlate this, then back to what it would look like in a medical image, right? It's like, this type of tumor versus this type of tumor? Is that kinda what you're trying to get down to? Yeah. So it all depends on how you set up your problem, right? If you have a different class for kid versus man versus old man or whatever, then you could do that segmentation. And you'd have sort of a different class that you're classifying those voxels into, or pixels in this case. But yeah, I mean, the way I see segmentation is, at least right now, really, just to get those great measurements of like tumor volume, tumor size, and the boundary, like I was talking about, classification is sort of something else where you could do classification and segmentation, but there's sort of two branches, right? You'd be working in parallel to do that here. So how are you then training the model to do this to you ripping through millions of images or a little bit of human in the loop? How does this work?

Justin Grammens  10:00  
Let's get built into it.

Nicholas Heller  10:01  
Yeah. So I mean, that's sort of the, the major advancement that lets us do this these days is machine learning, right, we showed a bunch of examples of this is CT scan. And this is what we want you to produce, which is those segmentation masks and the deep learning models, right, which is the thing that we build in pytorch. And TensorFlow, they figure out how to sort of learn that mapping. So it builds a mathematical function, which takes the image as an input and produces the mask as an output. So yeah, we spent tons of time trying to create datasets of, you know, segmented images that we can send to the the systems to learn. So I work with at the Bowman something like 2025 medical students, and I try to build tools for them to go in and be efficient when they're annotating these images. And we're trying to move a lot of the work to Amazon Mechanical Turk, because a lot of it is just kind of menial stuff. Right? Then medical students are busy, too. They don't want to spend tons of time on things that just any stranger on the internet could do. So yeah, it's it's an interesting problem. I think we need more people in at least the identity community to study how to get good labels for these things. And how much does the quality matter? That's a question I'm really interested in is if you have a bunch of errors in the labels, how much is that really going to affect the performance of your systems later on?

Justin Grammens  11:17  
Yeah, and I guess maybe you don't know until it happens. Is that fair to say?

Nicholas Heller  11:21  
Yeah, I think most of the time, I'd like to have some way to estimate it. But yeah, right now, you're pretty much in the dark, it really depends on what problem you're working on and where your errors are. So in classification, it's been shown that if like, even 90% of your labels are just totally missed on or randomly perturb, the network will still learn to classify things really well. But if you're specific and the way that you decide which are mislabeled, and which direction, they're mislabeled in, like the hard cases, for example, if there's one that's ambiguous, and you consistently mislabel, that, then that will reduce the performance of the system. And what's interesting about segmentation is all the errors that occur naturally are on the borders of regions, at least the vast majority of them are because you just kind of color outside the lines by accident. And those just by their nature are the hard cases. And so if you put that into a system, and you've got all these errors in the hard cases on the border, then the network doesn't have a very easy time learning those borders, because it's only seeing errors. So I think it's segmentation, the problem is really acute. And we need to sort of figure out either how to produce good labels or how to come up with ways during training to minimize this problem.

Justin Grammens  12:29  
Yeah, you mentioned Mechanical Turk. I've done some projects with that I, in our case, it actually worked out really well. They weren't working on anything like medical imaging, but more via just like OCR. And it was we had Mechanical Turk doing one side. And we also had Google vision doing the other side. And we were trying to build our own algorithm to pick the best of all these documents that needed to be scanned. And so yeah, I found the right Turk people can do some really good stuff. But in this particular case, they wouldn't need to have any medical background, or is it literally just sort of drawing bounding boxes and stuff like that? Or is that the type of solution where you're trying to get people to work in that capacity?

Nicholas Heller  13:04  
Yeah, so we're trying to get them to draw lines around things, which can be really, really difficult. I mean, even radiologists struggle with this sometimes, because the boundaries are just ambiguous and you don't know what's tumor, what's kidney and gray and have to make educated guesses. So what we try to do is bring in as much of the information as we can from the medical experts through these things, we're calling pins. So you put down a couple pins and like an ambiguous area, and then just like a couple other ones outside to make sure they're doing the contour. I like to break errors into two types were one is unbiased, and the other is biased. Okay. unbiased errors are one where if you were to take the average of a whole bunch of times that you did your squiggles, all the errors would average out the result would be perfect, but a bias there is one that you would repeat, if you sort of annotated the same case again. So it's an error of intention. So what medical experts are great at is having no bias, there's right sure they know exactly what they're trying to annotate almost all the time. Mechanical turkers, if you just gave them a raw medical image would have tons of bias errors, but you can get so many because it's sort of a commodity that you can average out the unbiased stares, right? Sure. Sure. So we try to do is get the medical experts to come in and put pins to minimize bias errors in the future. So they put these markers down basically saying you should annotate here, here and here. And then we send that to turkers. And then they do it a bunch of times, and then we take the average or release it to the public and sort of see what they do to aggregate them sort of a whole nother story. Yeah. Well is is there an open source or an open data set at all around some of the stuff and there's some really good open datasets in medical imaging right now. One is the bronze dataset. And this one I like to use a lot is b r a t s, it's the brain tumor segmentation challenge and sort of where I came from, and this has been going on since like, 2011, I think From the University of Pennsylvania and a bunch of sort of institutions that are collaborating with them, and they've got maybe 500 600 cases that they've annotated brain tumors in, in really high detail, they've got like edema and the proliferative core and a couple other classes around the tumor. And it's a really difficult segmentation challenge, the community has really rallied around it and tried to solve the problem as well as they can. They do the challenge again every year and sort of see where we are. It's kind of like a kaggle challenge if you've used those at all in the past, but it is associated with a conference so that it's people that are sort of already publishing in this community that like to participate, then they already big paper together about how it went at the end. So yeah, that that's one that's used quite a lot. There's another one called the lids challenge, that was a liver tumor segmentation challenge. I think that one happened just once in 2017. But that was pretty great. And so another few 100 cases. And then there's one that we created called the kids challenge, which is kidney tumor segmentation. And that's, again, a few 100 cases that we annotated and released to the public and let them sort of go nuts on. And right now I that what's taking up the bulk of my time is creating sort of the next iteration of that challenge kits 21. That's going to happen next year.

Justin Grammens  16:09  
Nice. So you're sort of forming this, I guess, this hop around all this, this data and all the work going on in that specific space?

Nicholas Heller  16:16  
Yeah, I mean, it's, it sort of has to be that way. Because just creating the data set takes so much time and effort that if you wanted to be a research group that both created the data and you know, trained your algorithms and you know, did all this other stuff, if it becomes too much, you're sort of, you know, going vertical, and your expertise really dilutes. So if you have one group that really specializes in the clinical side of the problem, collecting data, making sure that data is grades and has quantified uncertainty, and you're using a good metric, you know, all these sorts of bookkeeping, things that are really important, but you know, kind of boring. And then you you offload the actual deep learning to other research groups around to have these methods that they propose a notice itching for problems to demonstrate that, you know, our method works the best, right? And so it's kind of this symbiotic relationship, where we work on the medical side and create data, and then they just sort of eat it up.

Justin Grammens  17:11  
You mentioned about all these datasets and basically building these algorithms. Is there any sort of, I guess I'll use the term transfer learning going on where you could actually train the set looking for something in the liver, and then now it can be applied in the brain, for example, is that something that you see happening or is going on right now?

Nicholas Heller  17:26  
So yeah, that's a huge strategy for classification. In medical imaging and beyond, I mean, people really always have pre trained on image net, right, which is sort of the standard problem that was a challenge back what 2009 to like, 2015 2016, with 1000, different classes, millions of images, right, it's a big data set that you can get good weights on, and then you can transfer to all kinds of different problems, whether they're medical or you know, microscopic, you know, that it doesn't have to be natural images, which is an interesting, you know, phenomenon that networks that are trained on any type of image will transfer well to any other type of image. And they look nothing alike. So that's cool. And segmentation, it doesn't work as well. So actually, the the team that won our challenge, which is a group in Germany, from the German Cancer Research Center, they didn't use transfer learning at all, they just use our 210 training cases, and train something that did extremely well. So it's an interesting phenomena to that it works well in classification, but not segmentation. And why is that? Right? So yeah, I don't have the answer. But yeah, it's a interesting thing to study for sure.

Justin Grammens  18:32  
Do you hope someday? Or do you predict maybe someday that it might be able to be applied and segmentation?

Nicholas Heller  18:38  
Yeah, I hope it will. I mean, anything that we can do to reduce the burden of creating huge datasets would help, because it does take so much time and effort, I think something that's also going to be really important is being able to sort of put datasets together and create a single model that will make predictions for both, right. So using the little data set, and the kids data set to train a model that will segment both the liver tumors and the kidney tumors, right. I think that could be really important, because it's just so hard to create a single data set that has everything. Yeah. Right. So if you can put datasets together that individual components, then that could save a lot of time.

Justin Grammens  19:12  
Cool. So you might have mentioned this before, I mean, are these images, 3d images? Can you take a look at these tumors from any any direction?

Nicholas Heller  19:20  
Yeah, so these are cross sectional images, they're usually captured in the axial or they're always captured an axial which is sort of a slice right through your body, sort of perpendicular to the spine, and then they're captured a different slice thicknesses they're called. So it can anywhere from like half a millimeter apart, which is really thin, to like five millimeters or 7.5 millimeters apart, which is really thick. So the voxels are sort of rectangles, right? And they're usually the same width apart in a single axial image, but the distance between axial images varies quite a lot. So you can sort of do interpolation and things like that to generate different views. But yeah, they sometimes don't turn out very well. If you had a really That capture,

Justin Grammens  20:01  
gotcha. It just it seems like it's a, it's a maybe having that third dimension creates another order of magnitude of difficulty. I mean, I think picking out a cat or a dog, or whatever it is, in some of these, you know, sort of hello world type things. You know, once you have 1000s of images, it's it's not too hard to do. And I guess the corollary in that case is, is you got to look at the dog from an angle and the cat from an angle. And and these algorithms have been trained really well, to be pretty good. I mean, I would say almost as good as humans in a lot of ways to sort of pick out these animals. But I again, I don't know if the way that you're capturing these images with these slices then makes it even harder. Because you, it's not only just, Hey, I took a snapshot of this tumor from this angle, I need to actually look, you know, like, where is it inside of it. And it feels to me like, there's just a lot of data that needs to be run through in those types of images beyond just a bunch of pictures of cats and dogs.

Nicholas Heller  20:52  
Yeah, it is interesting, adding that third dimension, I think, the most immediate problem that causes you run out of GPU memory really fast. Because you want to create a 3d network that, you know, looks at the whole image at once and makes it look good segmentation or classification. And when you have that third dimension, you just have tons of voxels that you're dealing with. Yeah. But beyond that, I think it actually helps. So you can actually run this experiment, you can do 2d segmentation, just in any order on the images, and then put them together, and then run that against a single 3d segmentation model, which has a 3d convolutional neural network, that's basically the same in every way, except there's a third dimension, right? And the 3d model will win almost every time. And I think it's because there's actually just more information there for it to be using, it doesn't have to sort of make guesses about the context for how this slice was captured, it can actually see that this is part of something that looks kind of spherical, and tumors tend to be spherical. So it's more likely to be tumor, right, where it might just be sort of a weird artifact in a 2d image, you don't really know that that context. I mean, there's definitely more information to sift through. So you think, you know, maybe we need bigger datasets for this. And that might be true to an extent. But you know, since the images are captured so close to each other, there's a lot of autocorrelation going on, right? So if you see one, you know, sort of what the one next to it is going to look like. And so deep neural networks are really good at condensing information. So it doesn't turn out that you need too much more data in practice to segment something like that.

Justin Grammens  22:24  
When you're dealing with these, I'm assuming everything's anonymized in some some fashion. I mean, is there a lot of concern about, you know, giving up these images just dealing with medical?

Nicholas Heller  22:32  
Yeah, there is a lot of concern. And for good reason, right? You know, we want to protect the identities of people who contribute their data for our research, it's hard with images, especially because it's not only just the stripping off sort of the the tags that were meant to be identifying, there can be things that are sort of unintentionally identifying too. Like, sometimes with some of the older scanners, the the pH I that would show up sort of overlaid on the image gets burned into the pixels when they were digitized. So you have to actually, like physically inspect every single image and make sure there's nothing like that showing up. And then there's other things like with x rays, in particular, sometimes there's a weird jewelry or abnormal anatomy, and that would narrow down to a really small patient population. And if you know, like the institution where that person got that capture, you know, you could basically pinpoint exactly who it is. I mean, if you if you were to just pair really any of these with like a what you can collect from a Fitbit or a smartphone, you know, where somebody and over time and you know, you can make good guesses based on that, you know, their BMI and gender and age and things like that. anonymity is just really hard to preserve. But yeah, I mean, this with some of the really big datasets in this space that have been released to the biggest ones are always X ray, because they're 2d, and you know, they're easier to segment or to annotate. And they're smaller, just in terms of disk space, so they're easy to release. So there's a really big chest X ray dataset that I think came out of Stanford might have been a collaboration with MIT to some of these big institutions. And they released 10s of 1000s of patients. And you know, they had very strict guidelines on how they were anonymizing these images. And even they had five or six cases that ended up reaching anonymity, really. So it's really, really hard to get every single case. And, you know, we don't really know what to do with that as a community, because it's so valuable to have these datasets available to the public. So we don't want to stop releasing them, that would just be better. And we also, you know, want to be sensitive to these issues. I mean, there's there's tons of reasons of course, that you don't want to reach anonymity.

Justin Grammens  24:32  
Sure. Well, how does and I guess that is the big question in my mind right now is like, so how does this affect doctors in the future? Right. So a lot of stuff you're doing are in some ways, it either could be augmenting, but it could also be potentially like replacing How do you see this technology being used going forward?

Nicholas Heller  24:52  
Yeah, I think we're a long way off from these sorts of algorithms being used without supervision and you know, directly and patient It'll be faster in some areas than others. Like there's that what is it ID x d er or something. It's a local company that's doing retinal imaging for like diabetic retinopathy and other things. And I think they're actually already FDA approved and operating in some states. And we're gonna hear about that I was on this podcast, wasn't it?

Justin Grammens  25:18  
Yeah, yeah. Yeah, it was central.

Nicholas Heller  25:21  
Yeah. So I mean, things like that, which is just sort of a slam dunk classification problem, right. And, and these images are captured in a pretty standard way. And we've shown, you know, multiple studies from across the world that, you know, the the classification works and can be relied upon things like that could be fast. But, you know, I also, I get very wary about automated systems when it comes to things like screening, because there's so many problems right now with overdiagnosis and people that are just healthy, and they come in for a screening, and then they turn and patients are subjected to all sorts of tests. And you know, it turns out, it was just going to be benign for the rest of their life. And this is a big problem and kidney cancer right now, if you look at the epidemiology curves, like there's one for incidence, which is new cases per year, right, and there's one for mortality, just how many people are dying from this every year, mortality is just dead flat. I mean, it just hasn't moved for like 2030 years, but incidence is skyrocketing. And feeling like the incidence directly correlates with the proliferation of more CTS, or more cross sectional imaging. So we're catching tons more kidney cancer. And the trope has always been that if you see a kidney tumor and image, it's like 80% chance it's malignant, you better just take it out. So all these are getting taken out, you know, more and more and more every year, it's very likely that many of these patients would have died with it with without ever even knowing that they have this lesion on their kidney. So I mean, I think it's a good example where AI can help, because we could maybe get better at deciding, you know, which is going to be dangerous and which isn't. But in the shorter term is an example where AI can hurt because we're going to be catching even more of these than we used to maybe even smaller and smaller if the is really good. And we're going to be treating those two, because we just don't know what to do about them. You know, even if it's a 2% chance that it's cancer, and it's going to kill you if you still want to have the surgery, you know, even if it's fine. Yeah. And the stakes are very high. So I think we it needs to be very careful and needs to be very intentional. I think there's a lot of questions that need to be answered in terms of how do we regulate? When are we ready to put these into practice? The FDA, from what I've seen has been pretty lacks in terms of giving approvals, you know, they've got this, what is it the 510 K, or whatever pathway that's substantially similar, if you can show your your system is substantially similar to something that's already FDA approved, then the standards are much lower for getting that approval. And it's funny, because the thing that you showed it was substantially similar to could also be approved under the same process. So that was approved because it was substantially similar to something else that was 10 years earlier. And so what you're seeing is some of these totally deep neural network AI based systems that are being approved today are being approved on the basis of being substantially similar to something that was like from 1980. And it was trying to find like the peak of an echocardiogram or something, which is a totally different problem. Right. So, you know, I don't know it. Regulation can be a pain in the ass for people like us sometimes. But also there's a reason for it. So I don't know what the standard should be. But it seems to me they maybe should be higher than they are now.

Justin Grammens  28:22  
Gotcha. Yeah, no, there's I mean, anytime I think you're dealing with humans and human life, the just the scrutiny needs to be on a different level. Because,

Unknown Speaker  28:30  
yeah,

Justin Grammens  28:31  
I think about Well, a couple things. I mean, I guess one is, is a lot of those genetic tests that people get done, they uncover things that they would have never even known about and probably would have never died from but yet they want to try and act on them. And when you're dealing with such large data sets, such as is like you're saying, even I would think if even if there's some noise in there, now all of a sudden, it's just exacerbated by by Oh, hey, there's just one marker that's off here. Well, now you've got a million images, or whatever you're going to have now all of a sudden, it's gonna be amplified. I guess, in some ways, would that be true?

Nicholas Heller  28:59  
Yeah, it totally is. I mean, we do all these studies in medicine, right, where we have a cohort of patients that have been in the hospital for some reason, you know, some have some indicator, some don't. And then we look at, you know, the outcomes in each group, right. And as soon as you start talking about healthy people and screening, you're bringing in a whole bunch of people that never would have been in the hospital to begin with, or the clinic or whatever, you know, they're totally living their lives, and they're not going to go on to have an adverse event, or at least only very rarely. So the standards need to be very high in terms of being precise on which of those people is this actually going to affect? I mean, if you get a really big population, you can usually show us testicle significance. But is that like clinically meaningful? Is that is that difference, actually enough that people actually care about it, you know, apology and a risk score might tell you that your score or your risk of this particular cancer just doubled, but it probably won't tell you that your risk is going from point 0001 to point 0002. Right?

Justin Grammens  30:01  
doubling. Yeah, you got to figure how much is doubling from. Right. Right. So,

Nicholas Heller  30:05  
yeah, I mean, it's, it'll be very interesting. I think there's tons of opportunities, I think, you know, it probably will help in the long term, but sort of sifting through people who are just trying to sort of capitalize and make money off of this versus those who really want to make outcomes better reduce costs, you know, things like that. And there's a difference. And it's hard to tell. But yeah, I mean, in terms of medical specialties, the likely suspects for being replaced are probably radiology and pathology, right, because you can think of them as kind of big signal processors, they come into work and see their computer screen for radiologist or pathologist, they have their microscope in front of them. And then patients come in in the form of scans or tissue that was collected, and then they produce a report about what they saw. And then they go on to the next patient. Right. So you know, extensively, you think, Oh, this is a great candidate for automation, we can just replace them entirely. But there's a lot they're doing that's really nuanced and very difficult, and they train for a very long time in order to do it well. So we haven't been very careful. And I think we should start very narrowly with replacing very particular tasks. Like I said, diabetic retinopathy, you know, you can replace a visit to the ophthalmologist, it's very narrow, you can do it. Well, reading a chest X ray is a different story. I mean, there's any number of things that can show up there, probably 1000s. So are you going to classify into each one of those 1000 categories? And you know, how many patients do you need to validate something like that and show it's not just going to exacerbate overtreatment? It's going to take a lot. And it's just very nuanced very hard. I think it's going to take a lot longer than a lot of people would like. But it's worth taking that time, because the stakes are high. And the cost is already very high.

Justin Grammens  31:46  
Yeah, good point. I mean, it's so it doesn't seem like there's really this artificial general intelligence on it, where you can throw an image at it and say, find me anything? Or have there been any cases where you're looking for a specific type of tumor? Because it feels very narrow, right? It's It's very, I'm looking for this type of tumor in this specific region of the body and all that type of stuff. And maybe there hasn't been any cases yet where this AI was smart enough to say, you know, what, actually found this over here? You didn't know that. But I did that. And I'm guessing we're nowhere near that.

Nicholas Heller  32:14  
Yeah, I mean, so with kids, which is this kidney tumor segmentation data set, any model that you trained on that to do your kidney, tumor segmentation is only meant to be run on data that comes from like the same distribution that kids data came from, right. And we restricted the kids data to people that had already been diagnosed with a renal mass, right. So we're relying on a radiologist to make that diagnosis, and then send it to a model to get segmented. So you know, you could also run other models on that data and try to like, for instance, you could train a model on the lips dataset for liver tumors, and then run that on the kids dataset. But let's like it was restricted to people who had already been diagnosed with liver masses, right. So the the predictions that you'll get from your lips model on the kids data are gonna have tons of false positives, because they've never seen cases that haven't been diagnosed with liver cancer. So yeah, I mean, you have to be really careful about your data that you're running inference on to be representative of the data that it was trained on, for sure, or at least the data that was validated on so you can get good estimates of the performance.

Justin Grammens  33:25  
So do you have any, you know, I guess, for people entering the field, you know, you've been doing this for a number of years here. I mean, you know, you're it sounds like you started computer science. And then really, we're lucky enough to get involved in this grant and be able to sort of dive deep into this, or what are what are some of the maybe books that you've read or advice on classes? I mean, if you could think back, what classes you might take conferences attending or anything like that. Yeah. How would people want to sort of dive into this area if they were new to it? Yeah,

Nicholas Heller  33:51  
I mean, there's so many good resources online right now. I actually started in machine learning with Andrew Yang's Coursera, way back in the day, probably sophomore year of college. And I was still like a physics major at the time. So I wasn't even looking at classes at the U. And there's also like, for neural networks, there was this book, Michael Nielsen, I think, who was a physicist, he wrote a book about deep neural networks and sort of the math behind backpropagation and training and things like that. And I thought it was really good. It's very accessible for people who just have sort of a math knowledge and really no exposure to machine learning. In terms of textbooks in this space, I really like Bishop's textbook, the pattern recognition, machine learning, if you're a stats person, tips, Ronnie and has to have those great books elements is typical learning and intro to Cisco learning. Those are both really good elements is a little bit steeper learning curve. So maybe start with intro if you're really just getting into it, but really great books. I mean, there's just a wealth of knowledge in their courses at the U. We've got quite a few. Now there's a computer vision course, which I really like. And then we've got several machine learning courses, really any course in signal processing. We're really be helpful, because machine learning was really born out of that, or at least its origins have a tight tie to that. In terms of popular literature, there's been some cool books written even in the medical space, like there was one. I think it's called Deep medicine by Eric Topol, he's, maybe that's Harvard I some biggest Tisha and he's sort of towards the end of his career, but he really likes the idea of how AI can like bring us closer to our patients as doctors. So using like Alexa to record the encounter between a doctor and a patient and letting the doctor sort of get out from behind his computer and actually just interact more on a human level and let sort of AI take over the parts that they don't like, which is, you know, clicking boxes and epic and other hrs, which is a huge pain for them. Yeah, but yeah, I mean, there's just so many sources, and kale competitions are a lot of fun, you know, you get in and very quickly, sort of benchmark your method against state of the art. And that can be very instructive. And a lot of that stuff's open source. So you can go see it and see how other people did so well, which is pretty amazing. Sometimes, I mean, you can have something that you think is just perfect, and then somebody else was out of the water. But some of those failed competitions have just been so finely tuned,

Justin Grammens  36:12  
you had mentioned some I mean, some of the the one that you're going to be sort of heading up and building around these new data sets is that that's something mainly for PhD students and other people that are already in the field. Or if you have a link to it, I guess maybe you can give it to me after the program, if it makes sense to share with people. Yeah, I

Nicholas Heller  36:26  
can definitely share that. There's definitely no prerequisites to competing. You know, we get people from all over we had some high school students compete last year, until all the way up to experience people in the industry, like IBM had a team that competed and video had a team that competed. So yeah, I mean, that's open to everyone. And we love having people from all over in different disciplines. Well, where

Justin Grammens  36:49  
do you see yourself going? I guess, you know, once you get done with his Ph. D. program, what what's sort of your, your ideal trajectory here and in your career going forward? Yeah, that's

Nicholas Heller  36:58  
a good question. I don't know, you get so focused on just finishing that PhD thesis that anything beyond that, I definitely want to stay in medicine. And I've made I just spent so much time and energy, just getting up to speed with, you know, the interesting problems in this space, and how to work with doctors and building connections. And, you know, I just don't want to leave. I think, ideally, I'd like to stay in academia. But you know, I'm just keep my options open at this point, I think,

Justin Grammens  37:27  
good. Yeah. They're very, very smart to do. Absolutely. And the technology is always changing. And but I think if you, you know, if you if you have experience, and obviously know the toolset, you can be very adaptable. And in my career, it's been more I mean, I've been just doing software for close to 25 years now. But it's such a broad term write software, it can be applied in data science, it can be applied in IoT and devices, it can be applied in the cloud, it can be applied on mobile phones. So there's just there's all sorts of areas. So as long as you as long as you will, obviously have to have a passion for whatever you do. You're not doing it just just just for just for the money, but it sounds like yeah, you're putting in the time and effort to master your craft. And I think you'll you'll do well going forward for sure. How do people maybe reach out to you, people find you

Nicholas Heller  38:12  
emails always fine by email, I've been on it a long time, and I probably have no plans to get rid of it. So you can always send me stuff there, you know, LinkedIn, I've got a personal homepage, and that should always have my up to date contact. So maybe that's the best way. But yeah, I mean, anyone who has interest or you know, wants to work together or just talk, you know, definitely shoot me a message.

Justin Grammens  38:35  
Because I cut you off there. Was there something else you wanted to mention as well,

Nicholas Heller  38:39  
I was just gonna say I, in terms of advice for people getting into this space, I would advocate for not just going deep into technology, but have a different passion to I mean, I didn't intentionally go into medicine, it was just kind of it fell in my lap. But since I started it, you know, it's so valuable to have, I don't want to say deep knowledge, because I'm not there yet. But but to build a deep knowledge in a different space as well as technology is just, it's amazing, because you learn about all these cool problems, and you have the toolset to address them quickly. And you'll always work better with these sort of interdisciplinary teams which you need, right? If you're going to work on a problem, you need to know people who know the problem well, and the solution. Otherwise, you know, I guess during undergrad a lot of times I felt like a solution looking for a problem. Right. And that doesn't go well. So yeah, I I would definitely advocate for having some other interest area that you'd like to dive deep into, and then use technology to sort of do cool stuff there.

Justin Grammens  39:34  
Very well said, Nick. Cool. Well, I appreciate you being on the program. I think we this has been a great, great conversation and wish you all the all the best going forward. I

Nicholas Heller  39:43  
really appreciate the invite. Thanks for setting this up.

Justin Grammens  39:45  
All right. Take care. Nick. Thanks.

AI Announcer  39:48  
You've listened to another episode of the conversations on applied AI podcast. We hope you are eager to learn more about applying artificial intelligence and deep learning within your organization. You can visit us at applied ai.mn to keep up to date on our events and connect with our amazing community. Please don't hesitate to reach out to Justin at applied ai.mn if you are interested in participating in a future episode Thank you for listening