Conversations on Applied AI - Stories from Experts in Artificial Intelligence

Bongjun Kim - Sound Recognition and Audio Signal Processing with Machine Learning

March 24, 2021 Justin Grammens Season 1 Episode 16
Conversations on Applied AI - Stories from Experts in Artificial Intelligence
Bongjun Kim - Sound Recognition and Audio Signal Processing with Machine Learning
Show Notes Transcript

If you ever wondered how Alexa, Siri, Google can be trained to recognize and process audio, this is the episode for you! In this episode, we speak with Bongjun Kim. Bongjun is an AI researcher at 3M. He completed his Ph.D. in Computer Science at Northwestern University and his research interests include Machine Learning, Audio Signal Processing (e.g. sound event recognition), interactive intelligent systems, multimedia information retrieval, and human-in-the-loop interfaces. He also enjoys working on a musical interface and interactive media art.

I love this quote from him, "You need to look at the data before you choose your model. Please review the images, audio, or video to make sure your assumptions are valid." Wise words from an expert. I also highly suggest you pick up a copy of the book he recommends called Learning From Data listed in the liner notes below. I've started reading it and have found it fascinating and insightful!

To connect with Bongjun, visit his personal website.

If you are interested in learning about how AI is being applied across multiple industries, be sure to join us at a future AppliedAI Monthly meetup and help support us so we can make future Emerging Technologies North non-profit events! 

Resources and Topics Mentioned in this Episode

Your host,
Justin Grammens

Bongjun Kim  0:00  
so she wanted to look at some more like interactive machine learning concepts if you think about machine learning model a model we can think fully autonomous system so you build a model and but you let your model or your system run by itself to do something but human in the loop is a model like human and machine interact each other to solve problem together

AI Announcer  0:23  
welcome to the conversations on applied ai podcast where Justin Grammens and the team at emerging technologies north talk with experts in the fields of artificial intelligence and deep learning in each episode we cut through the hype and dive into how these technologies are being applied to real world problems today we hope that you find this episode educational and applicable to your industry and connect with us to learn more about our organization at applied enjoy

Justin Grammens  0:54  
welcome to the conversations on applied ai podcast in this episode we speak with Bongjun Kim. Bongjun ai researcher at three m he completed his phd in computer science at northwestern university and his research interests include machine learning audio signal processing like such as sound event recognition interactive intelligence systems multimedia information retrieval and human in the loop interfaces he also enjoys working on a musical interface and interactive media art and being a musician and a bit of an audio geek myself bangin i'm very excited to have you here on the episode today so welcome

Bongjun Kim  1:28  
thank you thank you for having me today

Justin Grammens  1:30  
awesome great great well so i gave a short background you know about you and where you're at today i guess you know how did you get to three um how did you what's sort of then your your career path

Bongjun Kim  1:41  
yeah so actually my career path is all over the place really yeah so currently i'm a data scientist at 3am so i completed my phd in computer science as you said at northwestern university i actually graduated last june then joined at 3am a couple of months ago so during my phd i have been working on a lot of interdisciplinary projects mostly focusing on machine learning and interactive intelligence system for audio specifically for audio applications before doing my phd i also have worked on some musical interface and media art during my master program in korea gotcha

Justin Grammens  2:19  
gotcha for sure so sounds like you've been really interested in audio sort of from a from a young age maybe

Bongjun Kim  2:25  
yeah actually i really loved music and this my biggest hobby is to make music and so i wanted to somehow connect my engineering knowledge skill to some music audio you know applications so actually i didn't go to phd program to learn ml actually at the begin thing so i was interested in music and audio at first and then i you know found mls really interesting so i kind of like learn more about machine learning during my phd interesting

Justin Grammens  2:57  
interesting cool that's good so very sort of maybe artistic left left brain thinking i guess or was that right brain left brain i don't know he had a little more of the creative side but i mean were you always into data and and and ai interfaces and stuff like that at all earlier on or was this something you sort of picked up as you got to got into college and move through it

Bongjun Kim  3:19  
yeah actually i realized too you know always was interested in how to how to use some technology to for it and some creativity how to use those are like engineering skills or technology to boost our humans creativity so how to express your way of making music and a new way of drawing some painting you know so we're always interested in those area and now there are a lot of people who is working on like ai driven art there's also an interesting topic i will say in daytime i work on machine learning and in the evening and weekend i work on some art forms yeah

Justin Grammens  4:04  
very cool very cool well i know when i checked out your website you've got a lot of papers you've written a lot of articles and stuff like that and i think as we get into the interview i want to then touch on a little bit more with regards to like maybe what's your favorite or what what what are some of the best presentations you think you've done but i do have a question that kind of asked most people that are on the the show here is you know how would you define ai how would you how would you define that to somebody who maybe is coming into us and doesn't really understand what artificial intelligence means

Bongjun Kim  4:32  
yeah so ai is really really broad concept to be honest i don't like to use ai to describe my role because i feel like you know if i say i'm doing ai i feel like i'm solving all the problems in the world yeah so but if i define ai i would say maybe it's kind of system or computer algorithms that makes computers perceive our world behave like human or i can say Consistent makes computer do things that we think require human level intelligence. That's kind of AI that I see. But But I prefer using actually machine learning, okay to using AI when I describe my work, because if I say machine learning, it's more like extracting patterns from the data so it can describe more better what I've done, actually.

Justin Grammens  5:24  
Yeah, yeah, yeah, for sure. It's it's a little more focused, I guess, around the specific tasks where, in some ways, artificial intelligence just seems so huge. Like you said, it could be solving anything? Perfect. Well, that's great. I'm always also curious, like, what is a day in the life of somebody who is, you know, doing what you do? You know, are you in the code all the time? You know, what I mean? Are you are you out in the field? Are you exploring? Are you trying different things? Obviously, we're in COVID. Now, so maybe you're, you know, at home a lot, but like, you know, if somebody gets into this field, and they, you know, they get a PhD, and they go to three m as an AI researcher, or machine learning researcher, hey, guess, what can they expect to do?

Bongjun Kim  6:02  
So I think in terms of the work, you think you need to think about kind of machine learning building pipeline, let's say from, let's say, you have to first start to understand the problem or define the problem. If there is exist problem, then you need to understand the problem first, or if there is no problem, you need to find or define a problem and persuade the stakeholder business problem to solve or something like that. And then you probably need to collect the data or inspect the data collected by your colleagues. So I think you had, we're going to spend a lot of time on collecting data, inspecting the data, and understand the problem. And then maybe you can set up a research question. And so what kind of information you want to extract from the data, what kind of hypotheses you want to approve. And then finally, you're going to build a model and test it and debugging your model. So every single step is really important. So you're going to spend some time on every single step every day, maybe you're going to have a different problem. But basically, the pipeline is really similar. But a lot of myth is like, you know, a lot of ice a lot of students who want to work on deep learning machine learning area, but they're really excited about building model. But actually, I want to spend a lot of time on understanding problem and and inspecting the data, pre processing the data. That's really important, sometimes more important than building a model. Sure,

Justin Grammens  7:29  
sure. Fair enough. Yeah. I've heard that from a lot of people that have come on the show here to talk about that a lot of its data cleaning, you know, you know, and, and you've got to make sure that you've got solid stuff. Are you defining problems? Or it sounds like there's maybe a different group that's defining maybe some of the business use cases, and then they're bringing them to you? Or do you have a say, in what you work on a whole lot?

Bongjun Kim  7:50  
Actually, as I said, I just joined my current company. So actually, I have, I'm still learning, I think I'm still in the middle of orientation. threes really use and there's a lot of things to learn. But if you think about my PhD work, usually, you everyday you renewed research paper, and you brainstorm your with your colleagues, and sometimes a lot of them, you need to define your problem, or you find some problem from existing research. Yeah, so kind of both ways. Sometimes you you wake up, and oh, I want to solve this problem. For there is a problem that no one think is important, then maybe you have to persuade people, this is a problem if I solve this problem is going to maybe change the world or something.

Justin Grammens  8:37  
Yeah, yeah. That's, that's what gets you up in the morning, I guess is Yeah, it actually solve a problem here that world. Yeah, and this certainly can be done with technology and some of these new deep learning patterns and everything like that, that people are able to, to put into practice. You know, as I talked about, at the beginning, with regards to your your bio, and your your, your background, I mean, you know, there there was a lot of maybe complex terminology there. And maybe we could break it down a little bit. I mean, one of them was, you know, I'm just kind of thinking back, I talked about you and audio signal processing, right? So sound event recognition is that like, maybe for some people come into this, and they listen to this program, and they don't maybe understand a lot of the details. Do you have an example? Would that be like, I don't know, I think of a glass breaking sensor, maybe, you know, in my house, I've got like, a, somebody breaks a window. Is that is that something that you would be working on? Or did you did you write papers around that or take a look into that?

Bongjun Kim  9:30  
Yeah, so I've been working on a lot of sound event recognition, but here the type of sound event that I was interested in is non speech, sound event, which exactly what you described, glass breaking or dog barking, car passing. Okay, that kind of sound Yeah. So I always was interested in building some machine learning model, a system that detects such a sound event as human does to understand the context of the world.

Justin Grammens  9:58  
Gotcha. I know Come I think the City of Minneapolis has like shotspotter. Right? So gunshots or whatever that might be, that might be something along those same lines. But how does that work? I mean, maybe break that down a little bit? Is it? Is it like literally have to find all these other types of sounds of a dog barking, like millions of different dogs barking, I guess, and feed that into this algorithm? Or are you able to let it sort of train itself? For somebody that maybe doesn't understand how you would go about doing that? Could you explain that

Bongjun Kim  10:29  
a little bit. So depending on kind of algorithm you're using, but let's say you are using one of the simple algorithms is supervised learning that actually require the labelled event. So let's say I want to build a simple algorithm that detect my dog barking, barking my dog there, what you have to do is, you're going to need a lot of different dog barking examples. And other sound event that might might happened in your environment. So non dog barking dog bark. Yeah, let's say you have like 100, of dog barking, different dog barking, and hundreds of non dog barking then then you let your machine learning model learn the mapping functions between the text dog and actual sound? Sure. So try to map between those two. So yeah, as you said, You're right, you have to collect a lot of different dog barking to detect a dog because it try to extract some common patterns, of course, all the different dog barking sound.

Justin Grammens  11:34  
Sure, sure. And you're actually looking at like, the the the waveform, I guess that's generated? Is that is that the level that you're looking at?

Bongjun Kim  11:42  
So usually, you can actually start with those waveform, but a lot of research paper actually use image version of a waveform which is spectrogram. Okay, so you can convert and wave score into two dimensional image, we can call it tiny frequency representations. So different representation of audio file, gotcha. If you look at the spectrogram, you can actually see the pattern of the sound in image. So you train model on those image data. Gotcha. Gotcha, will become similar to image recognition, actually,

Justin Grammens  12:16  
yeah, yeah, for sure. Cool. Yeah, yeah. Yeah, you know, I know when I go through, and I even when I edit some of these podcasts that I do, in like, GarageBand, for example, I get pretty good at seeing certain words that I know people say, you know, in there that I want to either you know, remove, for example, if they're a filler word, like an armored, or, or a stutter, right, you can kind of see that my eyes are getting really good. So that's kind of maybe that's what's what's what's going on is I'm I'm starting to train in my head, but you could very easily probably train a model to do that as well. an AI system, right? Yeah.

Bongjun Kim  12:51  
So if you look at those timing frequency representations, for example, you want to add it, your pocket, you have a podcast recording, or you want to add some like there's a sound alarm going on, but you want to remove those component, then if you look at those spectrogram actual there is some energy in high frequency because although the human speech frequency, then you can simply delete those those energies into spectrum. There is some tool actually to do that.

Justin Grammens  13:19  
Oh, cool. Cool. I you know, I think about way back in the early days, there was this, I'd have like an old stereo and you could flip on the Dolby sound, you know, basically the it was it was it was a noise reducer, right. And I think at the end of the day, that's kind of what it did it, it basically sort of got rid of the high frequency hissing sound that you'd be hearing on your tape, your audio cassettes and stuff like that. And obviously, it didn't do with machine learning it did it with some sort of transistor, or whatever it did. I could see a much smarter system these days being you know, hypersensitive, and being able to you're, like you said, sort of extract those waveforms, sort of pull out the background noise, right? It's probably you mean, it's kind of what's going on with some of these fancy headsets these days, right, these noise cancellation stuff and that would that be, that'd be safe to say, with you know, I know like air pods like Apple's air pods and stuff like that they're able to just get rid of a lot of the background noise so you can just hones in on somebody's voice or you know, I've been really cool with Ron I'm kind of going a little bit of a tangent here, but like, you know, it's really neat that like Alexa seems to be able to understand obviously my voice if I ask something and at the same time, you know, my kids are yelling or somebody else is saying but like it seems to really be able to hone in on what I'm saying is that are those examples of things that maybe you've you've read about or that you know, these papers and stuff like that was sort of talked about?

Bongjun Kim  14:36  
Yeah, yeah. So you're talking about kind of a speech theorizations so different individual have different tones and intonation and different patterns of sound and we can call it tambor. Different timbre of a sound so if you can train your model to learn those different tambor from different people that actually you can have a model then classify who taught this phrase or night and In podcasts phrase, for example,

Justin Grammens  15:02  
yeah, yeah, exactly. Yeah,

Bongjun Kim  15:04  
I think there is some commercial tools like for podcasters. So he automatically transcribe those as human speech with different human speakers.

Justin Grammens  15:12  
Yeah, yep. Yep, it does. I, in fact, I use a tool by a company called otter like And I will run it through that tool. And yeah, it's, it will hear the first person that spoke and I tag it as me. And then it will hear you speak. And I'll tag it as you and then it'll roll through the entire transcription. And you're right, it realizes the differences in our voice, and the transcription will get written out. It'll say Justin said this, and then it'll say, and then let's say Bob Jones said this. So yeah, there there are some pretty cool tools out there that can then that can pull it out. I thought, I just find it interesting that obviously, Alexa can seem to do it in real time.

Bongjun Kim  15:46  

Justin Grammens  15:47  
you know, in some ways, I mean, I was I guess it's not real time. Obviously, it must have it's learned my voice over you know, a series of me talking at it. But there's a lot of background noise going on in my house. But yeah, it can pick out my voice. I from across the room, I can tell it to do something, and it can do it. Right. And it just ignores all the other noises going on in my house, which is pretty, which is pretty cool. Pretty cool.

Bongjun Kim  16:09  
Yeah. Yeah. There's like a lot of interesting Toppy, like noise Active Noise cancellations and beamforming. Yeah, all this stuff. So now Alexa, all the smart speaker have that kind of features. And now Alexa Can I think can lease also listen to non speech sound like the glass breaking and fire alarm? stay active release the service a couple of years ago, I think.

Justin Grammens  16:33  
Right? Right. Yeah, yeah, yeah, yeah, you're, you're probably right. I mean, it's just, there's some really cool stuff you can do with audio inside your house that maybe, you know, we hadn't really thought about in the years past. But now, the speakers are becoming smarter and smarter along the way.

Bongjun Kim  16:47  
Yeah, the whole point of all this research, sound recognition research is we have a lot of surveillance camera to understand the context visually, right. There are also a lot of events that we need to detect only by sound. If you think of life, for example, self driving car. So when we drive, we also perceive some event by sound not just looking at, that's the event, right? So I think self driving car also has to listen to audio, not just looking at the road through a camera,

Justin Grammens  17:23  
for sure. For sure. Yeah. Yeah, you talk about AI, sort of mimicking what humans can do. And it needs, you know, humans react based on vision, but also sound as well. So you're, you're missing out on probably the entire picture if an AI system is only you know, seeing but not listening or listening and not seeing right. So

Bongjun Kim  17:44  
yeah, yeah, yeah. Well,

Justin Grammens  17:45  
so one of the other terms that that I use, I think at the beginning that maybe some people may not be familiar with is this idea of human in the loop. And and maybe you could talk a little bit about what that means to somebody from the outside.

Bongjun Kim  17:56  
So he wanted to look at some more like kind of interactive machine learning concept. If you think about machine learning model or a model, we can think fully autonomous system, right. So you build a model and the you let your model or your system wrong by itself to do something, right. But you're going to lose, and we're all like human and machine interact with each other to solve problem together. Okay, so I applied that concept to one on my project, which is called iset, interactive sound event detector. So this tool is for ones who want to label a certain type of sound event in a long audio file, let's say you have a long audio file, and you want to collect all the time regions containing certain type of sound events. And one way of doing that is you just listen to the audio. And just every time you hear the sound about your interest that you just select those reason the waveform, right, but it's really time consuming. And so you might say, Okay, then we can train a machine learning model. And we can let the machine learning model do all those labeling for me. But problem is, let's say you have a long audio recording, and you listen to it, and you find an interesting sound, and you just found interesting sound event, okay, then, in order to train your model, you have to collect a lot more data to train your model, but you only have a single example. Right? So then how can we speed up this labeling process with the help of machine learning models so so in my tool, let's say a human labeler list found to have less than one single sound event in their long recording, and you humans submit those initial label information to the system and the machine learning model. And the machine learning model searches through all the long recording and suggest me a couple of candidate reasons that I need to evaluate. Okay, so kind of like interaction look, then human labeler and listen to those suggestions suggested by machine and then label it and so meet those level information back to the machine so you iterate through this human in the loop in loop so working together you're going to end up you're going to find a lot of sound events that are similar to what you just found initially so that's what i call human in the loop interface because i built this interface

Justin Grammens  20:19  
for sure sure for sure you think there will always be human in the loop for interfaces or is at some point in the future and maybe it maybe it already exists today but it feels like human in the loop is still used a lot i guess today to to make sure that the model even just like as a cross checker or like a validation in some ways but maybe not i don't know i'm just kind of curious what you think

Bongjun Kim  20:40  
oh of course if we have a really advanced ai system the best scenario is we just you know like machine do whatever we want right without any human intervention but since machine is not perfect the idea is okay we probably need to give machines some human knowledge to to to boost the performance or the speed up the process like that yeah

Justin Grammens  21:06  
yeah so i guess in at least in the near term it's always it's probably good to have a human at least in the loop at some point yeah i guess i guess what i was getting at was you know do you feel right now that human in the loop interfaces are better than just letting ai just sort of just go crazy and do whatever it wants to do should there still be a check at least at least in the year 2020 we're at right now

Bongjun Kim  21:29  
yeah i think he could be better in some application i think dependent depending on how difficult it is like you know it's better to detect more complicated pattern with really limited resources

Justin Grammens  21:42  
sure sure and maybe limited data sets i think think about back back to the dog's eye is there a public data set for dogs barking you know i don't know if you were to maybe that was fictitious when you you know thought of that but it's like how would you get that many dogs and maybe you could get a smaller subset but then through human in the loop maybe you know train it better you know i don't i don't know i was just wondering if that's if that would be the case where if you have a small if you have a small data set you kind of need that human aspect to help train it that's true

Bongjun Kim  22:14  
and also one of the reason i build a model is actually the tool was for the one who have no knowledge of machine learning so if you have a machine learning knowledge maybe you can call it the beta build the machine learning model but i wanted to give this interest the simple interface to user that has no experience in machine learning but somehow by communicating communicating with this interface they can train a model implicitly

Justin Grammens  22:40  
yeah yeah for sure have you have you heard of mechanical turk by you know amazon

Bongjun Kim  22:46  
i'm using it actually i collected some data said using mechanical turk before

Justin Grammens  22:50  
yeah and that i mean that's been around for a decade or more i don't know it's been around it's a long time but it is kind of that idea where you feed it out to these you know these people that are just sort of online and you can let them to help you they can help have them help you classify what the information is and you could easily maybe take that output and then feed it back into a model for example

Bongjun Kim  23:11  

Justin Grammens  23:12  
yeah so yeah it's interesting i mean if there's a lot of people out there that can that can help you at least if they're they're not you know if they're available you might be able to sort of mass mass producer mass build this model what i was thinking about too was you know i was like i said i mentioned earlier i was looking at your website and just like you know there's a lot of you know you've done a lot of presentations you've done you know obviously talks on so many different things even i think i think recently there was like a midwest audio summit or something like that that was going on and you've got your name listed on it just a bunch of research papers and stuff like that which is very impressive i was i was wondering like do you have a favorite i guess you know either be a paper or a presentation you did over the past couple years that you maybe wanted to you know share a little bit with our listeners about why that was special to you or what what you had so much fun working on

Bongjun Kim  24:02  
i have to say my favorite presentation is my final defense for my phd actually yeah includes all the projects that i've ever worked on but maybe i can pick up on other projects not other than the interface that i just talked about the purpose of using that interface is to quickly collect a sound event in a long audio file okay the ultimate purpose is to collect the data but what if your goal is to train machine learning model to use it for for future purpose okay but still i want to reduce the human effort the labeling so to build a such a model you have to collect all the data right so again same scenario you have an audio file he recorded and you have to listen to it and in order to build an accurate model you have to label like this for example you have a 10 seconds of our chunk of audio and let's say you want to dump working classifier then you need to label like this oh there's a dog barking between three seconds and five seconds within this audio file right but imagine this scenario is it takes a really long time to listen to it at all there's no parking starting three seconds okay there's endpoint five seconds so even labeling single event takes a long time right so my idea is okay then how can we reduce to such a human effort and labeling to build a machine learning model so i suggested this new way of training model on less accurate label data the type of labeling that i just described a strong label because it's it has all the information onset and offset of sound event sure another concept called weak labeling that means okay i we have 10 seconds of audio there's dog barking but i don't care when actually it started and there's no barking so marian just 10 seconds somewhere in

Justin Grammens  25:55  
here okay

Bongjun Kim  25:56  
yeah that's a weak labeling as you can imagine it takes much less time to collect the data sure as human labeling so that's also known type of labeling but the problem is if you build a model on does with label data there's also get between a model trained or strong label and a model trained on weak label so a lot of research i have been working on improving such a weak model but at the time what i was thinking was okay instead of just spending more time on improving with model maybe i can suggest new way of labeling that is stronger than weak more recurrent existing weak labels while still takes much less time than collecting strong label so i presented it this one i call it point labeling okay call the new type of labeling that i that i was invented so the point labeling is like this so imagine your image that you are human labeler so you listen to the audio again 10 seconds of audio and instead of saying start and end a position of the event you just specify a single time position like okay four seconds

Justin Grammens  27:08  
somewhere in the middle there yeah

Bongjun Kim  27:09  
oh man in the middle so there's dog barking around is four seconds i don't care the start and end but it is much stronger than with model with labeling because it has some amount of time information sure but it takes much less time than collect a strong label so that's the kind of like new way of labeling that i suggested neat and then next next part is okay then that's good and then how can you train model on this with like at this point labeling that's another chapter of this paper so yeah so the results was really really exciting so the model trained on those point label was really the performance of the model is really similar to the performance of the model trained a strong label

Justin Grammens  27:54  
good so yeah so you were able to get it close to what a strongly yeah yeah with much much less i guess effort

Bongjun Kim  28:02  
yeah yeah

Justin Grammens  28:03  
that would have been in the past excellent that's cool so and so that was a paper that you wrote i guess at some point

Bongjun Kim  28:12  
nine teen yeah

Justin Grammens  28:14  
okay well yeah i have liner notes when i'll be broadcasting this podcast and you know putting it out so i'll have links to your website and links to all your papers and stuff like that so thanks for sharing that that's that's that's that's really fascinating you know the the other thing i was thinking about was you know all this human focus like you know all this focus that we need to do on audio by humans today and a lot of things i'd like to talk to people about to come on the program here is like how does that gonna change the future of work you know when people have jobs that are in the audio signal processing for example do you see people losing jobs because of this right d do you see do you see the change you know of people becoming getting out of work because of ai or or maybe it's not even sound maybe it's just you know machine learning in general what's what's your perspective on sort of how these algorithms are going to change the way we work in the future

Bongjun Kim  29:04  
i think we're still early stage of this ai era so there are a lot still a lot of work to do to improve and i think in your near ellis near future we will need more and more researchers and engineers to work on an ai model and someone might think that there are so many recent papers now coming out and it seems like the performance of model is kind of saturated but in reality is there's not that many ai system that is actually working and deployed on working and real industry so or services or some like the user interface with machine learning isn't working so so that actually needs a lot of people who work on some some like machine learning engineering part side of things so that's one thing and also i don't know when actually we can have really ultimate I don't know very good AI system. But what I want to say is LS Adam in 1020 years, we will need more people to to work on this AI and we're not going to lose that human job because of AI. Okay, guys, as we build a more better AI model, I think that can that can create more job opportunities that we don't know yet.

Justin Grammens  30:25  
Yeah, yeah, fair enough. Fair enough. Yeah, I mean, anytime there's advancement in, you know, the Industrial Revolution, or new technology that comes out, I think a lot of people get worried about how is this going to basically make make me and my job and my livelihood, you know, obsolete. And it feels to me, like, we've always been able to overcome that. It's always been like, well, that's good, you know, the machines are doing this, or, you know, the now these Intelligent Systems, you know, it's not so much that we're building a machine. That's not like an engine anymore, right? It's not like, it's not like you're getting rid of horses and replacing them with cars, because in that case, the horses just went away. Now, what's going on now is it feels I think it feels a little more scary, because we're actually replacing knowledge, we're actually training machines to now actually have more knowledge. And, and that maybe feels like to some people that like, oh, wow, this is something that's really going to impact me now. Because it's actually more of a, of a thought processing job, rather than a mechanical action, you know, what I mean? That's good, that's good. That's, that's going to take it over. But, you know, I, I talked to a lot of people, when I interview people on this program, and, you know, I, sometimes I share the story, you know, I, my dad was a doctor, you know, a physician for his entire career, I guess, you know, 50 years, or whatever it was, but he did it for a long time. And, you know, you look at a lot of these AI systems now that can pay, they can pick out cancer, and he was a cancer doctor. And so, you know, hematologist and oncologist, and so, these looking at blood, and he's looking at at all these all these, you know, all these cancer cells and stuff? And I tell them, oh, look, you know, there's a new AI here that's going to replace you for your job. And, you know, he always says to me is, is, that's fine, because, but nobody has the human side of it, you know, no one has that, that human touch, that one that can be, I guess, can be empathetic and sympathetic with with, with with with patients, and he never, you know, he's not, he's not really worried that doctors do become obsolete, because these machines can actually, you know, predict it. So I feel like, personally, that there's probably going to be always this human side to it, you know, this the sensing I think of the Myers Briggs, I forget what the, the four letters are, you know, but there is a sensing side of people that need to be into, like, a need to deal with intuition, and sensitivity, and I think, you know, computers gonna have a hard time or AI systems gonna have a hard time sort of, you know, like replicating that.

Bongjun Kim  32:44  
Yeah, yeah. That's a really good point. First of all, the kind of system that we call AI now, is actually really far from actual intelligence, yet. And also, even if they have some intelligence, as you said, intelligence doesn't solve everything.

Justin Grammens  33:02  
Yeah, for sure. For sure. Yeah. So Still, we, the humans will still have a job, I think, at least in the near term. Yeah, well, good. So you know, so as you're getting in this field, you've been been like working on some really cool and interesting projects? Is there anybody that you know, that you look up to, I guess, you know, that you're like saying, Oh, geez, in this field, this person is completely just doing amazing work, or, this is where I want ahead in my career, what's what's sort of, uh, you know, you're still early on here, and you've done so much. But, you know, where do you think you want to be? Or who do you want to emulate? Like, where do you want to go,

Bongjun Kim  33:35  
you think in the next 510 years. So in terms of audio related, like AI field, I want to mention some, like, interesting research group that I think is really doing good job. Like, for example, there is an audio group at Google research, there's audio understanding, they're doing really great job like, and also, they're a bunch of university, not not a bunch of some university to doing lab doing some audio recognition stuff. But if you ask me about my like, goal for the next five or 10 years, or on my problem is my interest is all over the place. Even if I I've been I've worked on a lot of audio stuff during my PhD, and I love doing that stuff, but probably, I don't know, maybe I can I can maybe start working on some new area. I don't know. But still I love working on data in general, you know? Sure. Sure. Yeah. To to extract some knowledge from data. It could be all the IP image or video. And also, I really love to work on some creative industry, great industry, even if it could not be a full time job but as a hobby or like serious hobbyist. Yeah, I want to work keep working on those area because actually before starting my PhD, I've worked on those area for two years. But during my PhD, I actually didn't have that much time to do that. So, you know, PC, life's pretty tough and busy. So now I'm planning to continue to do some of those creative recall the creative coding in my spare time, and I want to see how it will go. But yeah, probably, I will keep working on AI and machine learning and data in some other, I don't know, applications.

Justin Grammens  35:37  
Sure, sure. Cool. Well, you I, you know, you had mentioned sort of interactive media, interactive art, you know, like music and stuff like that. And there's some really cool stuff you can do with sensors and, and really make things a lot more smart around the world. I've been kind of focusing a lot of my career over the past 10 years, I've really grown around the internet of things. And, you know, whether you know it or not, there's a open source platform out there called Arduino, which allows you to do some pretty cool stuff, you're familiar with it. And that came out of actually an art school in Italy, actually. And so they wanted a simple way for people to create interactive art. And so they created this open source platform that allows you to pretty much plug sensors in and you can code stuff with, you know, with 10 lines of code, you can do some pretty amazing stuff. So really cool stuff. And you talk about data, right? I mean, you can get all sorts of just awesome data, whether it be a gyro, you know, score, you know, doing stuff with your accelerometer, like you said, sound, you know, video, all sorts of really cool stuff you can do and, and make it interact with the world. So that's, that's kind of a little passion of mine. And it sounds like, you know, you're interested in some of that interaction, some of the interactive stuff as well. Yeah.

Bongjun Kim  36:47  
I'll say like, my ultimate goal is not losing curiosity, that that's, that's my goal.

Justin Grammens  36:54  
That is a great goal to have. Because Yeah, I mean, the moment you're not curious anymore, you're kind of stagnant. You're staying Yeah, you're standing still. Right. So you know, and you mentioned Google, I don't know if you've if you probably know about and I'll put a post it too, but there's a pretty cool website called teachable machine. Oh, yeah. Yeah. And you so you can train it, you can basically record a bunch of different sounds of a dog barking, for example, are like all sorts of nice now, it's very simplistic, but I think the thing that's kind of cool, is it like it? It teaches people yeah, like, Okay, this is how it works. And and, you know, while you have a small subset of data, at least people start to understand, oh, it's not so scary. You know, this, the whole machine learning, deep learning system is it's kind of approachable, you know, people get worried and scared about all the math and all the stuff that's around it. But at the end of the day, it's like, oh, I can kind of see how this works. If I go to this website, this teachable machine website and play around with some audio and some videos. So yeah, I don't know if there's any other sites or any other I guess even a more broader sense, like, Are there books or advice on classes that you would suggest people, you know, if they're interested in these subjects to kind of, you know, buy or learn or conferences to attend?

Bongjun Kim  38:08  
For sound recognition related, there's a two conferences that I've been attending. One is called the IQs International Conference on acoustics, speech and signal processing. So you focus on a lot of acoustic speech. And some also include vision, because it's part of the signal processing. So that's a really big annual conference. And another one is workshop that focuses more on the sound event records and detection is called D case. So detection and classification of acoustic scenes and events are called the case. So good thing about this workshop is they actually help decaf challenge every year to kind of like, you've heard of the image in a talent image is the challenge, right? It's kind of like sound a version of a vision a challenge. So you can go to a website. And good thing about this website is, since they're having this challenge every year, they also release some data set, you can play around with a lot of audio data there and clear evaluation metrics that you can actually build your model and compare your results to other people's results. That's really a good starting point. For some who are interested in this sounded better recognition area.

Justin Grammens  39:23  
Excellent. Very cool. Do you have any favorite books, I guess, ai or even non AI related? Right? So just outside of the field that you've read recently, maybe that you would that you find interesting,

Bongjun Kim  39:36  
I want to recommend what machine learning book so it's called Learning from data. So it's by Professor Yasser.


It's kind of like six years old was seven years old, how old it is. But actually, this is not deep learning book. It covers really basic of the machine learning model. The reason I recommend this book is you know, now People who recently started machine learning, they start with the deep learning, right? There are a lot of people who didn't learn any basic emotional, they jump into the deep learning and like play around TensorFlow pytorch. But I think learning basic is really, really important. For example, it covers a lot of like, what the training testing means, in a ml experiment, what overfeeding means, what is the hypothesis space means. So if you already started deep learning, and you if you want to understand more about Bayes to machine learning, this is really good. The good thing about this book is really short, that like thick book, so approachable. And also, it actually have a companion website having a lot of the YouTube lecture. So yeah, this is really good. I think

Transcribed by