Cal Al-Dhubaib - Secure, Fair, Transparent & Auditable Models Artwork

Conversations on Applied AI

Welcome to the Conversations on Applied AI Podcast where Justin Grammens and the team at Emerging Technologies North talk with experts in the fields of Artificial Intelligence and Deep Learning. In each episode, we cut through the hype and dive into how these technologies are being applied to real-world problems today. We hope that you find this episode educational and applicable to your industry and connect with us to learn more about our organization at AppliedAI.MN. Enjoy!

All Episodes

Conversations on Applied AI

Cal Al-Dhubaib - Secure, Fair, Transparent & Auditable Models

December 06, 2022 • Justin Grammens • Season 2 • Episode 30

0:00 | 38:02

The conversation this week is with Cal Al-Dhubaib. Cal is a data scientist, entrepreneur, and Chief Executive Officer of Pandata, where his team empowers organizations to use artificial intelligence competitively. Cal has an emphasis on designing and building trustworthy AI, and is especially passionate about the ethics of AI and how organizations can build the right talent to support their AI initiatives. Cal has been listed on the Crain's Cleveland 20 in their 20s, and Cleveland Smart 50. He is the first data science graduate from Case Western Reserve University and an advocate for careers and educational pathways in data science through regional week workforce development initiatives.

If you are interested in learning about how AI is being applied across multiple industries, be sure to join us at a future AppliedAI Monthly meetup and help support us so we can make future Emerging Technologies North non-profit events!

Resources and Topics Mentioned in this Episode

Enjoy!

Your host,
Justin Grammens

Cal Al-Dhubaib 0:00

One let's talk about the need for trustworthy AI. And what we mean by it. So trustworthy AI is more secure, it's more fair, it is transparent and auditable, each of those pillars is really, really important to consider, especially what it looks like in the absence of it. So when we talk about secure, right, we're often building these applications and these models on large scales of data. We're talking millions and billions of records. In some cases, think of how much personally identifiable information is there. So how do you build these models that preserve user privacy? Or how do you deal with the data most fundamentally in a way that is ethical and fair to the people? It represents?

AI Announcer 0:42

Welcome to the conversations on applied AI podcast where Justin Grammens and the team at emerging technologies know of talk with experts in the fields of artificial intelligence and deep learning. In each episode, we cut through the hype and dive into how these technologies are being applied to real world problems today. We hope that you find this episode educational and applicable to your industry and connect with us to learn more about our organization at applied ai.mn. Enjoy.

Justin Grammens 1:13

Welcome everyone to the conversations on applied AI Podcast. Today we're talking with Cal Al-Dhubaib. Cal is a data scientist, entrepreneur and Chief Executive Officer of Pandita, where his team empowers organizations to use artificial intelligence competitively. Cal has an emphasis on designing and building trustworthy AI, and especially passionate about the ethics of AI and how organizations can build the right talent to support their AI initiatives. Cal has been listed on the Crain's Cleveland 20 in their 20s, and Cleveland Smart 50. He is the first data science graduate from Case Western Reserve University and an advocate for careers and educational pathways in data science through regional week workforce development initiatives. Thank you cal for being on the applied AI podcast today.

Cal Al-Dhubaib 1:54

My pleasure, Justin, good to meet you all.

Justin Grammens 1:56

Well, well, we love talking about artificial intelligence and applications of it. And you're living right in that sweet spot today with your company and everything you've been doing in your career. And so I mentioned a little bit about you being the CEO at Pandora, which is awesome. And I know what we'll talk about what you guys do there, but maybe you could give a little bit of a background on how you got to where you are today maybe the trajectory of your career after finishing up at Case Western Reserve or and what you've been doing

Cal Al-Dhubaib 2:19

well, it was a little less after finishing up. And while trying to finish I like to cheer but it was the best seven years of my life.

Justin Grammens 2:28

Sometimes things take longer than you than you expect.

Cal Al-Dhubaib 2:31

But I think that's the point here. Right? So my journey was one of figuring it out as I went along. And originally I was fascinated with computational neuroscience, this idea of mathematics of the brain, how do we quantify the human state and I got involved in medical research. And I was doing data science before it got cool. This was way back in 2012. Which is funny when you look back at it now I was dealing with, you know, patient health datasets and your questions like how what keeps a healthy population healthy. And, of course, as we know, there's no such thing as clean data. And I found myself falling in love with the field of data science. And I kept hearing over and over again, people saying we have data and we don't know what to do with it. So it was really cool at the time getting to be a pioneer and what was traditionally a field that was starting to become strong and Silicon Valley but hadn't quite yet then exported to other regions. And in my final year at case they had announced this data science major to my dad's disappointment, I said I'm going to change my major one more time. So that's how that came about. You know, I really was doing data science, I was looking at Applied Mathematics, systems engineering and programming, I had a field of practice, which was healthcare, data science. And that's how that came about. And my final final year of undergrad I started pan data idea was there fewer than 150 data scientists in all of Northeast Ohio and about a third of them worked for either IBM, the Cleveland Clinic or Progressive Insurance. And so that kind of tells you the scale of what was left for the rest. Over the years pandemic groups are word of mouth. And as the field has become a little bit more established our focus today and data is machine learning and artificial intelligence. And specifically we like to say we do both design and development.

Justin Grammens 4:21

Gotcha. Well, so maybe explain a little bit let's let's pick away at that what's what's design, and then what's development your mind.

Cal Al-Dhubaib 4:28

I think we're all familiar with traditional computer programming for hire traditional software engineering for hire, I have an application I want to build. I've dreamt it up. This is the functionality. I want this widget over here. I want this user experience over here. And once you have the specs, you can hand them over to any skilled software engineering team. Of course results will vary a little bit but you can take that requirement that you're able to create yourself handed over to a team can they can build it no problem. This doesn't work just as well when it comes to building any things that are in the machine learning or AI space. This is because instead of dealing with deterministic problems, I want this user to click here and have this result. We're dealing with probabilistic prep, I'm trying to classify I'm trying to guess I'm trying to predict an intention, I'm trying to augment process in some way using data or topics. And I like to define AI before we get too far into this is software that does two things, really well recognize complex patterns at scale, and react to those complex patterns at scale. And so it's this art of designing systems around patterns that you could possibly recognize right now might be regular, boring spreadsheet data that can be transactional data. But it could also be things like qualitative texts, customer feedback, it could be audio video recordings, etc. And learn to recognize or build systems that can recognize these patterns, and then do something useful with it or react to it in a useful way. And that takes a lot of co creation, it's very hard. In my experience, for organizations to come up with those requirements in advance, they might have some sort of idea of the problem they're trying to solve. But they're not able to formulate those requirements. And so that's where the design part of what we do comes in. And then development is pretty much what it's always been once you know, what model you're building, what tools you're using, what workflow you're fitting into. It's building a model, training it on data, and you kind of follow that typical engineering process.

Justin Grammens 6:31

Gotcha. You guys have any particular tool sets that you like to focus around? Or are you pretty agnostic,

Cal Al-Dhubaib 6:37

we're pretty agnostic, we like to play in the open source realm, we've seen deployments across AWS, Azure and Google. And of course, the multi cloud is becoming more and more common. So our emphasis today is building tools and open source libraries or tool sets that integrate and play nice with one another. We do a lot of work with h2o dot AI and their driverless AI product, which helps accelerate the time to building useful trustworthy models. And we're starting to see more tools like that being commonplace in practice.

Justin Grammens 7:11

Gotcha. Now, the datasets really matter. I guess I know, when you and I had talked prior to this, you know, you guys really like to focus on I don't know, what is it highly sensitive data, or data has a lot of regulatory regulations around it. And I definitely want to talk with you about the types of data because you know, I have a couple friends of mine work for UnitedHealth Group or optim. Here, Twin Cities, and they're huge in that space. And you know, one guy said, you know, the reason he's been so successful in his career is because he can predict why your grandma takes her blood pressure medication or not. And he's pretty good at it. And so he just kind of says that, you know, kind of tongue in cheek, but there is there is something about this, this data here that is highly sensitive, maybe you want to talk a little bit about, you know, how you guys are working in that space, because that that is one of your your core focuses, right?

Cal Al-Dhubaib 7:55

Well, let me talk a little bit first about commoditization, which is a big trend shaping the machine learning and AI world today. In the past, building these truly intelligent systems that can deal with things like documents and notes and images at scale was the exclusive domain of big tech, you had to be at Google, Facebook, Amazon, working with massive compute budgets. And you'd had teams of teams working on different parts of the problem. And just explain how complex this was. Let's just take an example of customer feedback, you'd have to have some sort of engineering team that is focused on being able to read qualitative text from whatever source that may be, and get that stored in a format, you can do things but you'd have to have a team as very, very specialized on language models, dealing with things like grammatical syntax and or recognizing words or phrases, you'd have to build up large amounts of labeled examples. This is an example of a customer talking about pricing. This is a customer talking about time to respond to that. And you'd have to do all of this before you could even get to any modeling. Today, there's a lot of pre existing models and tools that you can use almost like building blocks. I don't need to train a language model anymore to recognize pricing, I can use a hugging face a few shot learning approach based on a large language model that's already been trained at scale, and then refine it to my specific problem. So commoditization is rapidly transforming the nature of the work in the AI design and development space. In regulated industries, though, it is still very complex. This is either because the data and systems you're working with are protected behind firewalls, and it makes it harder to use these open source tools are harder to use these API's that require moving the data outside of the boundaries or because there's restrictions on how you might use these systems. And you need to have models that are easier to interpret, easier to document with an audit trail and still there So there's still a need for a lot of bespoke modeling and data science work under the hood in these organizations, whereas a lot of AI opportunities or AI enabled opportunities and marketing examples can be solved for pretty easy within tools that are off the shelf.

Justin Grammens 10:18

For sure, you're right. There's not a public data set or a public model that they can just sort of pull from. And then they're lacking the internal experience, I guess they don't have the data, the data scientists on staff or they're not planning to train them long term, I guess, how are you seeing yourself sort of engage with these companies?

Cal Al-Dhubaib 10:33

Yeah, no, it's a fair question. Funny enough, we used to think it was a threat, whenever there is an established data science team within a client's like, oh, no, we're gonna we're gonna lose this opportunity. And what we're finding is actually our most successful clients are ones with existing data science teams, what ends up happening is data science teams within organizations, they take a handful of different models, but they're often assigned to these operational problems. Hey, readmissions so big for us, we need an entire team focused on predicting readmissions. That's all they do day in day out, right, your friend, your colleague that you talked about in optim, he's predicting what he called staying consistent consistency of medication. Right, right. So you have these teams that are solving their specific problems and trying to get just a little bit better at that problem. You don't necessarily have dedicated teams working on the moonshot activities, or in some cases, there's needs to do bursts of development. And so working with contractors on these types of projects is really common. Data science, talent, scarcity is still there. So even if teams exist, there may not be enough within that organization as well.

Justin Grammens 11:36

Yeah, for sure. And so you're you guys were located in in the Ohio area, I guess, around Cleveland, right? Yes, sir. And you I talked about in the, in the beginning, or during the intro really around educational pathways and data science. And I know, there's like Ohio X, and some of these other initiatives going on. I'm based in Minnesota, but we probably have a lot of similar type things. Maybe you could talk a little bit about like, what's the ecosystem going on? Because it seems like you're very interested in sort of like growing that

Cal Al-Dhubaib 12:03

really cool that you mentioned, Ohio X, big fan of the organization. Chris Barry started it a couple of months before COVID took place. And it is this idea of before, you know, the shutdowns happened after COVID. Yeah, that was like, Wow, what a time to get started. And yet in spite of that, for the first time, instead of having a tech bubble in Cleveland that's focused on healthcare and manufacturing a tech bubble in Columbus that's focused on you know, FinTech and what he called venture capital with Rev one and the types of innovations they're bringing there. And then Youngstown and there are shutting Cincinnati their own ecosystem Youngstown with like the automotive, electric battery, or you have these little specialized pockets, not talking to each other. And almost we're all going to the coasts to raise money, bring in deals from outside to our individual regions and groups like Ohio, x is creating more connectivity here. And especially with remote work being a lot more common. I mean, today, what we're seeing is we're getting candidates, even though we've opened them up nationwide, we get a lot of candidates from Columbus, from Cincinnati from Toledo applying to work with Cleveland miscounted, because we have that, you know, Ohio base, it's still kind of like, you know, connected. And we're now seeing more of that happening here. Also, local universities are producing more data science talent that's staying here at the undergraduate level. CSU just launched Goodman State University just launched an undergraduate program in their computer science department, John Carroll's had won for some time, and I was a part of the original cohort and case.

Justin Grammens 13:35

That's awesome. Ya know, there's, it seems like the industry needs more talented people in it. I teach a graduate course actually at the University of St. Thomas. I will say, Yeah, over the years, it's seen a lot of foreign students come in, take courses and then leave. Yeah, right. So they they come in and learn data science, and they go back to their country of origin, what have you. So it's that's very exciting to see people stick around, and in my opinion, really helps the local economy,

Cal Al-Dhubaib 14:00

right? Well, in the past, there wasn't a critical mass of job opportunity. Right? When you talk about there were 150 data scientists in Northeast Ohio, when I started can data in 2016. That's not enough, right? If one job goes away, where are you going to look, that company might have had one or two openings, there's the big three employers. Now that's starting to shift. But if you broaden the scope of that a little bit to say, let me look at the tech centers in Columbus, Cleveland, Lido, Cincinnati, etc, you start seeing more opportunity to stay here and move from job to job. We've seen this for quite some time with software developers and organizations, even competitors will form alliances because the only thing worse than losing a person to your next door neighbor is losing them to outside the feature.

Justin Grammens 14:43

Right. Yeah, yeah, exactly. Exactly. Yeah. I mean, are you finding it tough to find qualified candidates? I mean, it just seems like technology. Just the pool has become quite thin, I guess. And it seems like in a lot of ways, but

Cal Al-Dhubaib 14:57

we've been really blessed actually, we we Aren't had challenges finding individuals, I think it's because we do a couple of things differently. But also, if you look at Job satisfaction rates in the data science market, they're still relatively low and churn is high, because the field is maturing, a lot of data, scientists end up joining upstart data science organizations within more established companies, but their models aren't necessarily making it to production or they're not getting to work on things, right, there's still this statistic that you might hear somewhere where 80% of AI projects or 90% of AI projects fail to yield a return. That's resulting in a lot of organizational shifts, people being reorganized from one group to another lack of satisfaction, that position so churn is still high. And for us that he'll have folks that want to work at Pandata is getting to work on a variety of different projects. That's that kind of scratches their edge. Yeah. And

Justin Grammens 15:55

I would say that's exactly I mean, that's I have a services organization called lab 651. And that's, that's really, those are the types of people that we look for they, they don't want to sit in a cube and do the same thing over and over again, for year after year after year, they're looking to really sort of be able to change it up.

Cal Al-Dhubaib 16:09

Yeah. So I mean, it's an interesting trend. But again, we're our organization is 10 people strong, we have three openings now. But it's not like we're trying to place 20 efforts, horror stories, like one bank, in particular, tried to hire 100, data science slash data engineer roles. And after six months, they were only able to place six of them. Oh, wow. Like, so you hear stories like that, but is actually happening?

Justin Grammens 16:35

Yeah. And I think the same thing, you know, that's what I've seen at optim and UnitedHealth, group, stuff like that here, too. They're they're planning on hiring hundreds. But at the end of the day, you're right, they they pull in as many as they can, but then they also they work with consultancies, you know, firms like yours, or mine, or what have you to help them sort of at least move things forward. Right? Still the work, the work still has to get done. And speaking of the work getting done, I mean, what is the day in the life of you as the CEO of your of your company?

Cal Al-Dhubaib 17:03

Great question. It's evolving as the company is growing, historically, I'd be I was very, very involved in the projects. And I'd work strategically with clients on educating their teams on what AI is raising the bar and what I call AI literacy. I think that's a key competitive advantage of any organization to invest in, and helping them identify, Hey, what are these use cases are opportunities to build intelligently augmented processes.

Justin Grammens 17:29

And then over the course of time here, it's sort of shifted and changed.

Cal Al-Dhubaib 17:33

And changing, I'm still somewhat involved with that. But you know, we have our data science consultants that are now a little bit more involved in the design process, hey, let's co create the requirements with the client, and then actually determine what needs to be built. And then it's the traditional data science process after that, find the data, label it build a model, test it, get some user validation, and then get it into production.

Justin Grammens 17:57

Yeah. Do you find that AI? is a bit of a loaded term in comparison to like machine learning? For example, are you I mean, just what's your gut sensor? What do you feel like the industry is saying about AI? I mean,

Cal Al-Dhubaib 18:09

this is a great question. I've gone back and forth on this so many times. And for a while, I was feeling guilty about using the term AI. And I recently heard someone at a conference say this. And she she's a well respected thought leader in the AI industry. And she goes, let's all agree that AI is marketing. Right? And yeah, you know, also, let's agree that when we say AI, we mean machine learning anymore. And I think for a while professionals like myself, we would try to say AI equals this machine learning equals that, I don't think it matters anymore. Biggest, broadest applications that are used to power AI, the software is machine learning as an approach. So if you want to get really, really technical, I'll say AI is the software. That's what you interface with when you have it in production. And it can include things that are not machine learning. It can include rules, it can include simulations, it can include bla bla, bla, but machine learning by and far is the method we most often use as building blocks in AI systems. But anymore, I'll say machine learning, I'll say AI and I'm talking about the same thing. Well, I

Justin Grammens 19:19

mean, I've been in this IOT space here for more than a decade before it was even called IoT. And that is a straight up marketing term. I mean, it's like well, Internet of Things like what what the heck does that mean? Yeah, keep saying IoT was gonna change the world right? And so you know, in some ways, you just kind of embrace it kind of kind of embrace it because that's what the industry is using and but it can be a little bit detrimental because everyone has their own spin on like, what AI means or what IoT means or what machine learning and then there's deep learning right? So like, those are the techniques going on. So yeah, it's a little bit of basically vocabulary soup in some ways, but I'm curious to know what you think. I mean, I personally I actually kind of struggled with with regards to what I was going to name this podcast, right where was I gonna go with artificial intelligence or machine learning or data science, or whatever it is. And I mean, AI, in my opinion is a very broad, so I can have guests that are talking about, you know, voice, I can have guests that are talking about computer vision, I can guess they're talking about just stuff all over the place. Are you seeing a lot of machine learning happen all the time? Or do you guys come into certain situations where you're like, you know, watch a standard regression, you know, statistical modeling will work just fine. So let's just do that.

Cal Al-Dhubaib 20:23

Well, I mean, it's regression not technically machine learning that especially right. But no, you're right. I was at a conference recently, I do attend a lot of conferences to stay up to date. So I guess, to answer your question, what do I do on a daily basis, it's trying to put myself out there absorb, meet people unpack some of these ideas. And it was a roundtable of AI experts. I still feel weird calling myself an expert. But that's what we were. We started this conversation by everybody introduces themselves and what they do, you all get varying pitches. I've I run this division at this company, and blah, blah, blah, this one person raises their hand and they go, Hi, my name is Sansa. And I work at this company and I use regression to predict fraud. And it was the most authentic, honest, straightforward answer, frankly, yeah, in many cases, advanced machine learning is still overkill. And we do often have to walk clients off the ledge. Okay, let's start here. Start with your basic toolbox. If it's good enough, and you get there in five days, that's better than trying to prototype something that's going to take a few months before we can even test it. And you can always build and improve from there.

Justin Grammens 21:34

Yeah, cool. Well make sense. I mean, you're walking customers. I mean, since sometimes, they've never deployed anything to production, right? So yeah, they have to have which it feels to me like, sort of the last mile can be the hardest, like, how do you actually put this in production and have it usable, and be able to make sure that the model is not drifting too much, and then be able to this, this whole ml ops piece? Sure, you know, talking with a lot of people about, which can be very difficult to sort of like, staff up and train and make sure everyone's doing the right stuff. So

Cal Al-Dhubaib 22:05

one big mistake that happens is overcomplicating the initial design process, right? If you have a very powerful model that takes very, very specialized experts to monitor for drift and has to be retrained very frequently with expert care, is that really feasible to deploy in production? Or what happens when you have 10? Such models, a or 100? Such models for various different functions, tasks, capabilities? Teams? At what point does the overhead of maintaining these tools that have been built start becoming more costly than is worth the result? Or the reward that you get from it?

Justin Grammens 22:47

Yeah, absolutely. I think of total cost of ownership, I guess, to to build these things

Cal Al-Dhubaib 22:51

that I've been using this catchphrase a lot. And he was inspired by Wall Street Journal article by the title of in order for AI to be used more needs to become boring, and is by Jennifer strong. And so now I flipped that on its head. And I say, it's never been a more exciting time to make AI Warren, you've got change management, you've got user testing, you've got low cost of ownership, none of this stuff is new. It's just the fact that we're shifting from deterministic programming. If this, then this, too, we're now dealing with probabilistic type programming. When I see this pattern, I might want to do one of these things.

Justin Grammens 23:27

Yes, yep. Exactly. Exactly. And it's and it's, you're right, it's all based on percentages. There's a likelihood it's gonna go this way, but it's not 100%. Yeah,

Cal Al-Dhubaib 23:35

you're not you're not guaranteed 100%. You know, you got to set thresholds, you got to set risk, you got to do some risk management around it. It's not can't go wrong. It's what you can do when it goes wrong. And how bad might that be? And what are you okay, with happening? Yeah, for sure.

Justin Grammens 23:51

For sure. You know, you mentioned going to conferences and stuff like that, which I think is phenomenal. And I'd love to attend as many as I can. For people that are entering the field. What do you suggest that they do? I mean, are there classes or books, and we will definitely put links to your careers page and all that stuff on Pandora. But yeah, if I was somebody new getting out of school, what would you suggest I how do I get into this field?

Cal Al-Dhubaib 24:14

There's only so much classroom style learning, but I think helps. And one of my favorite thought leaders is Andrew Yang, his open AI classes, he's got a really, really great certification that he's just relaunched on Coursera he put one of the original classes in the field out there and they just refreshed it on his I'm forgetting the name of the website, but he has this newsletter called the patch. And he just recently put out a post on like the three pillars of staying up to date as an AI professional. One of them is keeping an active portfolio right find toy projects, keep developing, tinkering with projects, tried to see things on Kaggle look at how other people have solved a problem and then try to do it yourself. Right that's building intuition. There are so many students pass tape on opportunities, and especially now with conferences still going hybrid and looking like they'll be staying that way for some time. You can attend many conferences for free or for really, really subsidized rate as both a student or emerging career professional. And that's where you're really seeing workshops, hands on activities. There's some conferences, they're better suited for beginners, Open Data Science Conference is one that kind of covers a broad range, but it's really good for those beginners. I just attended data bricks is data plus AI Summit, and their online conference was 100%. Free to attend virtually.

Justin Grammens 25:37

That's awesome. That is awesome. Yeah, I mean, I like how would you do this without the internet in some ways, right. It's just, it's just fabulous that there's so much content out there. And like you said, you know, how do you keep active? I mean, maybe look for an open source project that you could just just start reading over and maybe contribute to not to say you'd go into the guts of it. But sometimes you just need documenters. Yeah, right. Somebody, somebody to read it over and write documentation and write test cases and stuff like that. So that's another awesome way to just get kind of get get your feet wet, I guess. And sometimes it just comes down to just meeting the right person at the right time to write just general networking.

Cal Al-Dhubaib 26:14

Oh, I'm a big fan of meetups as well. So we have the Cleveland Big Data meetup. I know you have a meetup as well, we applied AI meetup. Yep. So big data. Meetup is physical, or Cleveland, Big Data meetup physical here in person, but from what I've learned by meeting other professionals across the country, is that these exist in Chicago, DC, New York and other major metro areas. So if you haven't yet looked it up, I'd recommend that people listening, look up if there's big data or Open Data community in your region and just start showing up.

Justin Grammens 26:42

Absolutely, for sure. Awesome. Well, how do people reach out and connect with you? Well,

Cal Al-Dhubaib 26:47

I'm really active on LinkedIn. So for anyone that wants to strike up a conversation or stay in touch, you can connect with me there,

Justin Grammens 26:55

I will absolutely put a link to your LinkedIn profile here in the show notes for sure. As we when we publish this, are there any other topics, stuff like that, maybe things that I missed that you wanted to

Cal Al-Dhubaib 27:05

share? So you know, I trustworthy AI is a really big part of what we do. And I don't know that we spent enough time talking about that. I have to give that a plug.

Justin Grammens 27:13

No, please, please. Yeah, yeah, let's let's let's dive a little deeper on that. Yeah. So

Cal Al-Dhubaib 27:17

one, let's talk about the need for trustworthy AI. And what we mean by it. So trustworthy AI is more secure, it's more fair, it is transparent and auditable. And each of those pillars is really, really important to consider, especially what it looks like in the absence of it. So when we talk about secure eight, we're often building these applications and these models on large scales of data. We're talking millions, billions of records in some cases. And if you think about really, really large language models that are trained, based off of everything that's been observed on the open internet, think of how much personally identifiable information is there. So how do you build these models that preserve user privacy? Or how do you deal with the data most fundamentally, in a way that is ethical and fair to the people? It represents? You have the second pillar here, which is more fair, which is this idea of, you know, is the model? does it behave differently, given different cultural contexts? And does that have any associated harms? Does that exclude potential users from benefiting from the application, there's a lot of challenges with transcription technology with users that have accepts Facebook is actually or meta now is working has a team working on solving for some of these challenges. But you know, there's this inclusivity of the actual application, but then there's also the risk of underperforming in a certain group. And we've seen no shortage of cases, apple with their credit card that gave women in the same household as mom, and lower credit limit health care algorithms designed to score patients were prioritized care, putting equally sick black patients behind white patients for that triage care, you know, recently some of the issues with facial recognition on different minority groups. So you know, you have lots of these different unintended consequences. And sometimes you can ask the model, why did you predict what you predicted? And because these models are so complex, it can give you an answer, that it's just math mumbo jumbo, and you can't really pinpoint it to a specific feature attribute that speaks to this last pillar, like auditability, what a model makes a prediction, can you reproduce it? Can you recreate it? If we talk about model ops and a model being automatically retrained every every day, every week, every month? Even? Can you reproduce the original model? It's so that if something wrong, it does happen with a recommendation that's maybe used in patient treatment or outreach, right customer experience, and somebody says, well, well, this was incorrect. This was this was biased. This hurt me in some way. Can you reproduce the models original results and understand why it did what it did? Even if you could? That's the big umbrella that's trustworthy. I know I thought Oh,

Justin Grammens 30:00

no, no, no, no, no, no. I mean, it's very important. In fact, you know, just the that whole idea that you talked about around fairness. And you know, what, as you were talking about that, I mean, there's a lot of it because we're based on bias data. I mean, a lot of the data that we're using is old and biased. And so the models aren't going to be fair, like why does that happen? I guess, in layman's terms,

Cal Al-Dhubaib 30:22

is a little esoteric, but bias isn't necessarily bad, harmful biases, what we define as harmful bias shifts with context over time, things that we would consider totally appropriate 20 years ago, are no longer appropriate today. That was recently rewatching friends. And there's some episodes where like, the characters are outright sexist. And you look at them like, wow, that was on TV. Yeah, right. Right, right, I'm sure you know, today's cultural norms on that's just like, well, you can't joke like that anymore. You can't talk like that. So as society evolves, let's acknowledge that our definitions of what a harmful BIAs are also change and evolve over time. They're also very grounded in cultural references, what's considered okay and acceptable here, and may not be considered okay, and acceptable in the Middle East or in China, and vice versa. This is a really nuanced issue. And the crux of it is subjectivity. Whenever we have a situation where multiple humans can't agree on the right answer, or multiple humans may look at the same problem and see different situations, you're going to have bias in your data, whether or not that's harmful, depends on the situation. So subjectivity is a really big driver here. And I like to diagram this out along two axes. So if you imagine a chart, where your x axis is the complexity of what you're trying to do, easiest thing is I'm being counting right? This is present not present. And looking at an image, there's a cat, there's no cat, you know, the next level up is predicting, I predict that this patient is likely to have a readmission with, you know, 80% probability in the next 30 days. And what's a little bit more difficult, making a recommendation? Out of a fixed set of things, what course of action might you want to take? Right, that's a little bit more complex. And then finally, you have generative, where you're creating new texts, you're creating new prompts some of the language models that you see out there like GPT, tree, or even Dali to which is producing images. That's very, very complex. If you think about that, the definition of what is correct, it's a lot more clear when you're being counting, is there or is it not there? And you want to look at an image and say, That's a cat, that's not a cat. Right? Yeah. When you look at a prompt generated by GPT, three, is it good? I don't know. Right? It could write this like 100 different ways, and it's still satisfy the criteria, or maybe 90 of those ways are pure garbage? Or maybe you have a literary expert that's like, No, I've never read something that looks like that. That's terrible. Right? So what is objective truth, like it's really hazy. And then when you you look at the Y axis, and think of that, in terms of the complexity of the data, barring spreadsheets, tables, and numbers, text, being slightly more complex images being more complex video, a collection of a lot of different images, and then audio, where you have tonality, in addition to content in the text, right? This is increasing complexity. And of course, that only gets compounded with the scale of the data, it's much easier for me to write a test case for is this number zero, or one than it is to say, you know, is this phrase associated with this topic, then it is to say, is there an animal in this segment of this image, you can do it, you can read these test cases, but it gets more complex. So the more subjective The end result is, and the more complex your data is, the harder it is to measure and define objective correct. And the easier is to encounter unintended consequence. I go through that long winded explanation to say, but that is a big opportunity space for emerging professionals to be focused on. It used to be the hard part was building models. Now we can build models that do all of these things. With all of these types of data. The hard part is how do you do it safely? How do you do it the right way? How do you build these test cases? Or when inevitably, these models will go wrong? For

Justin Grammens 34:17

sure. And then you need subject matter experts to come in and actually understand the, the, like the ethics because they're, and that's the other thing I was gonna ask you the ethical AI and trustworthy AI, these are kind of in sort of in the in the same sphere. But yeah, you need somebody to come in and I guess, since they know the ethical side of this better than maybe you were I do because of their maybe their racial background or maybe because they know the language per se, they they can actually be maybe the subject matter expert on this. And in these areas. It could be

Cal Al-Dhubaib 34:47

right so it's a there's no there's no easy way. There's no shortcut here. You have to have humans involved. You have to have things labeled. Yeah, to have reviewers. You hear a lot of this AI is replacing jobs. I think what it's doing is shifting jobs. All of this care and monitoring of AI systems, it's gonna take a lot of work and a lot of highly skilled individuals who know how to ask the right questions.

Justin Grammens 35:11

Yeah, yeah, exactly. And having a human in the loop is going to be essential here, at least in the near term. Maybe forever. Yeah,

Cal Al-Dhubaib 35:19

no, I mean, it's, it's, it's not gonna go anywhere, anywhere. Because think of the scale of, yeah, how do you handle 10,000 and 100,000 subjective creations on very, very complex datasets?

Justin Grammens 35:33

For sure. Well, it sounds like a good space to be in and you know, with PandaDoc, kind of sort of focusing on that probably is a differentiator for you guys in the marketplace. Right.

Cal Al-Dhubaib 35:43

And it's all about asking the right questions, right. I don't think there's any easy answers. But I think the mistake most people make is trying to gloss over it. Like this is an academic thing. It's not academic.

Justin Grammens 35:55

Yeah, you cited some of the things you know, with regards to Apple and others. Yeah, these if your models are trained poorly, they can be a huge public relations nightmare. I think for some of these companies, if they don't do it the right way.

Cal Al-Dhubaib 36:05

I'd recommend you know, LinkedIn, check out the Allen AI Institute. They've got some really cool interactive that you can I use them in seminars that I teach, but they kind of give you a chance to play with bias language models to kind of see like, if you ask it one thing, what does it give you on the on the other side? So it's, it's really eye opening to see how these models can go wrong when you try it yourself?

Justin Grammens 36:29

Yeah. Awesome. I will make sure I add that link as well. Well, Cal, thank you so much for enlightening us here today on a number of different levels around ethical, and you know, bias AI and trustworthy AI is the terminology that you guys use. But I think I think it's awesome. I think I think your story really around kind of going out and starting your own business too. I think, you know, it's one of the things that you and I talked about was entrepreneurship, right. And I have a lot of respect for people that sort of want to build something new and different and kind of want to do it their way. And it sounds like that's what you're doing it wet at your company, and you're creating jobs and you're you're you're solving problems and you know, that's that's very, very, I would say that's that's very, very satisfying, I think for many people that want to you know, work for you and companies that want to be customers of yours. And it's just it's very, very cool that you're doing that. So well. I congratulate you on that. Thanks, Jeff. So, so yeah, Cal, I look forward to having you on the program in the future. And thank you again for your time today. And be sure to keep in touch. Yeah.

Cal Al-Dhubaib 37:26

Thanks for having me. And thanks, everyone for tuning in.

AI Announcer 37:30

You've listened to another episode of the conversations on applied AI podcast. We hope you are eager to learn more about applying artificial intelligence and deep learning within your organization. You can visit us at applied ai.mn To keep up to date on our events and connect with our amazing community. Please don't hesitate to reach out to Justin at applied ai.mn If you are interested in participating in a future episode. Thank you for listening

Justin Grammens

Host