Paige Bailey is one of the few Artificial Intelligence (AI) and Machine Learning (ML) experts who is able to convey complex technical topics in an easy-to-understand and relatable language. Her Twitter feed is full of AI/ML paper recommendations, open source projects, and insights about the ever-evolving tech space.
What was her journey like to a career in a very complex domain? What does it mean to build an inclusive AI? This week, we sit down with Paige to learn from her experience.
You can also listen to the audio directly:
You can find Paige on the following sites:
The podcast was produced by Den Delimarsky and Courtny Cotten. Music by Wataboi from Pixabay.
You can also use the official podcast RSS feed, and add it to any podcast application or service.
Den: So we are on episode 22. We’re kicking off the February, which is a rectangular month this year. I did not know this. Somebody tweeted about this totally random meme of, like, “Today kicks off the month that looks like a perfect rectangle in the calendar.” Four weeks of February that look exactly like a rectangle, so…
Courtny: Shortest month of the year too, right Den? I think. I think I saw that.
Den: Yes, and so on this special month we have a special guest, and that is Paige Bailey. Welcome Paige.
Paige: Excellent. I am very glad to be here and very excited to talk a little bit about machine learning and about an about all of the great stuff that you guys have been doing and that we’ve been thinking about at Google and DeepMind.
Den: I like that this conversation is so comfortable that you jump straight into, you know, “I’m excited to talk about these things”. We have this list of questions hold on! But OK, so, Paige, for folks that don’t know what you’re doing, and I’m sure a lot of them follow you on Twitter, and I know that you’re going to be repeating these things, but tell us more about what you do.
Paige: Absolutely. So I am a newly-minted, or relatively newly-minted product manager for DeepMind’s research platform team. DeepMind - you can think of it as kind of another business unit within Alphabet. Alphabet is the parent company of Google, and of. Waymo, which is a self-driving car company, and of several other efforts that you might have heard of. But DeepMind, Google Brain - they’re both research organizations within Alphabet, so as a product manager for the research platform team I helped support the great work of all of the research scientists and engineers who are doing things like building AlphaFold, which helps with drug discover, and can also do things like suggest vaccines, and AlphaGo which was the model that was able to beat a human player at Go, and AlphaStar, and all of the other really nice reinforcement learning projects. And prior to that I was a Google Brain, managing all of Google’s machine learning frameworks. So that is TensorFlow, JAX, and also Keras. And before that I was working with Den at Microsoft, and really loved my time there as an Azure Cloud Advocate focused on machine learning and AI, and then also as a senior software engineer in the Office of the Azure CTO.
Courtny: So that’s a lot of stuff, and I’m going to tell you upfront that AI is definitely not my strength, and so help me as a new person to the field, understand how pervasive AI is in the world we’re in today. I think understanding that helps people realize that it’s deeper than you would think in our daily lives, and it’s impacting more things than you would expect. I know, just by cursory knowledge of it, is that it really is kind of everywhere. It’s behind almost any product that we’re using today, right?
Paige: Awesome. That’s an excellent question. And I agree wholeheartedly. AI is, you know, something that Google cares a great deal about. We consider ourselves to be an AI-first company. Pretty much any Google service that you’ve used has a TensorFlow model embedded within it, so an example of this would be if you have a Pixel phone, or an Android device, and you use kind of the speech to text functionality. Congratulations, you’re using a TensorFlow model. Or if you’ve typed an email, or started typing an email into Gmail and it’s done auto-complete for a few additional words, you know. Congratulations, that’s deep learning.
Courtny: Or doing your Google search, right? Like everybody is using Google or everybody has used Google and knows that it recommends things as you’re typing.
Paige: Exactly. And so that is a particular library called TF-Ranking, which uses TensorFlow to kind of give the likely best candidate for what you might be searching for based on a large history of previously searched for products. And so it’s not just Google, though. If you go on Amazon, Amazon.com, and you’re searching for products and it sort of suggests products that you might also be interested in. That’s also a ranking algorithm. Same for Netflix. You know, like, the nifty Teams backgrounds and Zoom backgrounds, where, you know, the person is still in the camera, but you know it looks like you’re on a beach or climbing a mountain. That’s something called image segmentation, and that is also - congratulations, deep learning embedded within a browser to be able to facilitate some of this work. So it’s been really magical for me to see in particular how all of these… All of these products that I’ve helped sort of create have been deployed throughout the world, and the creativity of users as they’re coming up with these novel use cases, and these ways to help people. Everything from drug discovery, to detecting diabetic retinopathy in X-Rays, to suggesting what I should watch for my next Netflix binge. It’s really, really nifty.
Courtny: So in in tech… Being in the tech world, you know, I’m a designer, but I don’t work really closely with… Well, the closest I get usually are search algorithms. I’m working with search interfaces and learning how they’re tuning AI models for that, or we’re monitoring analytics and that type of thing. But I really get to see AI applied outside of, hard or not, just kind of dry technical situations. And you mentioned medical discoveries. What are some of your… I guess some of your favorite or heartwarming applications of AI that you’ve seen in the world, and it might not be Google, it might be anybody using it, or Google technology or Microsoft technology - what’s going on out there, that’s just amazing with AI that you can tell us about?
Paige: Yeah, that’s that’s another really wonderful question. And I agree wholeheartedly. The magical parts of AI are seeing it really, really help people and improve their lives. Like, you know, the Terminator use cases - certainly very far away, if at all. But some of my favorite examples… Microsoft has something called Seeing AI which can help people who are without sight do things like program or be able to navigate the world through the descriptions of what a video camera on a phone might see, right? So it’s using deep learning to be able to understand, like, you know, “this is the sidewalk”, “these are some trees”, “it looks like there are two people approaching,” helping empower people who are differently abled. I think that’s one of the most the most meaningful ways that deep learning can impact everyone. I’ve also been really sort of enchanted to see the way that deep learning is used for energy optimization, so being able to optimize the power consumption in datacenters. DeepMind had a collaboration with Google’s datacenters to sort of reduce the carbon footprint of how much the computer is using within our data center sources. I think Microsoft also had a similar project. But that is, you know, as a former geophysicist, that means a lot to me as well. And then also deep learning has been used to detect exoplanets, to be able to sort of understand space and to bring new life into some of the satellite imagery that had been initially analyzed by humans, or maybe by sort of less powerful techniques. But now the data can be reused and sort of spot all of these things that we had missed before. So I really love the idea of machine learning not being like a people replacer, but more like a way to sort of enable, and empower, and improve the lives of humans, and be like a helpful assistant to help answer questions.
Courtny: Yeah, and I think that a lot of people look at AI or automation as an enemy, when in fact it’s an augmentation of processes we don’t really want to be doing anyway. And can we open up our talents to, you know, more artistic endeavors or…
Courtny: …to spend more time with other humans. We don’t have to do this stuff that can be done by a computer, but I think that like I mentioned, a lot of people look at it as making you obsolete, when in fact it just creates more opportunity for us and it propels us forward into things we don’t even know about yet.
Courtny: Den, go ahead man.
Den: No, I actually had a good question that dovetails something that Paige already mentioned and that is related to your career. So you mentioned you’ve been a geophysicist, and I kind of want to zero-in… You’re doing a lot of work with AI today.
Den: What started at all? Where did it come from? The passion for AI, the passion for communities? Because you have a pretty remarkable career - you started in GIS, you worked with data science, you worked in advocacy, and now you’re essentially product managing for AI. How did it all start? Tell us more about your journey.
Paige: Yeah, so I do absolutely have a very wonky career path, and I am so grateful for the opportunities that I’ve had to grow and to kind of gradually move into this machine learning space. And it makes sense. It makes sense in some respects when you just remember that machine learning and deep learning, or just, like, fancy math - it’s just, you know, like I’ve always really loved linear algebra and differential equations, and all of the rest of the things that we’re all taught in school. And my path in undergrad - I did geophysics and applied math, and my thought process was I’m going to be Lady Carl Sagan. Like I am going to be able to be a science communicator for planetary science purposes, which… I was able to get two really wonderful research internships at the Laboratory for Atmospheric and Space Physics in Boulder and at Southwest Research Institute in San Antonio. Our teams were understanding lunar ultraviolet and sort of the presence of water and ice in these permanently shaded regions on lunar surface, so if you remember there was this project where they had a rocket and they launched it and it crashed into the moon and then they monitored the ejecta and that was one of the things that I got to be involved with. So building GIS, lunar ultraviolet maps and sort of spotting statistical anomalies in these massive sources of data. And so now if you describe this to anyone they’re like “Oh yes, data science, machine learning,” and back then it was just like “Oh yes, research,” and so kind of by virtue of doing this data analysis work and and working with GIS, and, you know, starting to do Python programming both as part of my applied math courses and through these GIS courses I got a job opportunity at Chevron to build some databases for them but also to create geostatistics plugins for some of their earth sciences applications, which again, were doing a sort of more traditional machine learning. This was back in 2013, but still definitely machine learning, though nobody really called it that. Or data science, because those terms just didn’t exist yet. And so once, you know, Harvard Business Review came out with a couple of articles that were like “Data science is the thing to do” they were like, “Hey, that’s what Paige is doing. Let’s just call her that now” and you know that’s kind of history. And I was also very fortunate in that all of this was happening in Texas. So I founded PyLadies in Houston, which is an open source Python user group for women and, you know, for pretty much anybody that wants to come, though And also many of the data science companies like Anaconda and Enthought Canopy, are based in Austin and we’re leading these really vibrant communities. So I got to get involved with open source when I worked at Chevron and that I think really spoke to some of the people at Microsoft that were building out the cloud advocacy program and, you know, like I am just so delighted that that I was able to to work in the cloud space, specifically on data science machine learning, and then that kind of naturally transitioned to open source machine learning frameworks at Google.
Courtny: Some of that sounds like you were in the right area at the right time. That kind of like…
Courtny: …continue to give you a slow trickle of interest and keep you on your toes, right? And learning. as a kid as a kid. Do you ever see yourself working with this?
Paige: No, I am so lucky. I am so lucky. I grew up… I am actually dialing in today from Waco, TX. The bustling metropolis of Waco, TX. Which is actually much larger than it was when I was a kid. I grew up in a small town, kind of a 30-40 minute drive away from Waco called Itasca… Itasca, Texas, and it has 1300 people and just as many cows, and my first experience with the computer was an Apple II that my mother salvaged from being thrown away from our local school library, because the town does not have a regular library, and an electrician friend helped us get it working. I started programming with BASIC when I was like 8 or 9 years old, but the Apple II was pretty much my only toy, so I learned how to use it and I had never really imagined like computer science or programming as a career choice, because it was fun, and who pays you to do the fun things.
Courtny: It’s amazing how just having a device or having access to something can totally change your life. I know that I grew up in a town very similar, basically replace the cows with corn and you have my beginnings, and I had a friend whose mom was a professor and she exposed me to a lot of literature and things like that and they happen to have a computer that had Internet and her husband was a designer and he worked from home. So some of that definitely influences you, right? For your experience having that device, you know, that one guy that was able to repair it - what a life changer that is, right?
Courtny: Kind of cool to see how that has played out for you.
Paige: Yeah, and for you as well, right? There’s something magic I think about having… There’s something magic about learning something when you’re a child and being captivated by it and it’s something that you really never forget. There’s a reason why I can still recite all of the Pokémon and have problems talking about all of the molecular structures in biochem. But it’s you know, it’s… I’m so fortunate.
Courtny: And so how is that information relevant right now? Why are you in my brain still?
Paige: Exactly. Cool. Excellent.
Den: So I actually I’m curious now that we’re talking about the various kinds of learning, and you know you’ve been fortunate as a child to have kind of the means to do that. I’m curious in terms of kind of the learning routine. What is that like for you? Because you know, when we’re in our early age, I want to say, we have a lot of time on our hands. We can just sit down and like tinker with computers. When adult life kicks in, that time just kind of vanishes because, you know, the day to day stuff comes in. You have to do a lot of these responsibilities, pay bills and whatnot. And for you personally, I know that I follow your Twitter and I’m constantly fascinated by the things that you post in terms of new research papers and… How do you structure your learning now that you’re working professionally, I think that’s a question that is interesting to not only folks early in the career, but anyone in their career because you have to create time to research the space to better understand it. How do you go about solving that problem?
Paige: And I will say the same thing to you as well, Den. I am shocked and intimidated by the amount of content that you’re able to produce about this stuff that you’re learning. Like it is great to see, but it’s also like wow, like, and to be honest, it’s often a reminder like, “Oh, I should go read that paper.” I found that as I’ve matured in my career, it really helps to be mindful and to be choiceful about how to set aside time. So making sure that if I do need to answer emails, and I know that I need to answer emails each day, I’m going to set aside, you know, 40 minutes in the morning, 40 minutes in the afternoon, and I will be hyper focused on solving those tasks during that time. And then hyper focused on whatever chunks of other time I have later in the day, whether it’s meetings or whether it’s like “OK, I’m going to go through…” There’s this great, and I’m typing it into the chat so folks can read it later, there’s this great website called Arxiv Sanity which was created by this guy named Andrej Karpathy, but it takes all of the machine learning and data science papers from Arxiv and aggregates them into one single location so I don’t have to do the mental guesswork of attempting to find out which are the most compelling papers for me to read today. The other things that I really love are kind of modular courses, like I used to be really into DataCamp, but also Coursera, and it’s approach towards 5-minute videos 10-minute reading snippets, you know, things that are really bite sized that you don’t have to devote hours and hours of the day that you can just pick up as you’re walking to your next location. And I also like… Kindle has just been like some sort of crazy drug to me. I got a Kindle and realized that you can rent books from libraries for free. I feel like nobody told me this maybe earlier in my life because they realized how hard it would be for me to not do that all the time, but having the ability to pull down any ebook from anywhere, from any library, and then just read it and document it, and highlight it however I want without having its physical presence as a reminder in my house, of how I abuse my books - this is just amazing.
Courtny: Not only do you have the Internet, and now you have access to an endless library, right?
Paige: It’s crazy. It is absolutely absurd. And again touching back on y’all’s points from earlier in the day, right? Like as an 8 year old, 9 year old kid, I could have never imagined this. Like Project Gutenberg, whenever I first saw it back, you know, like I forget what particular year, but I remember I was very young and all I could think of was just like “Oh my God. There’s a text file of Little Women on the Internet” like I am going to be reading all day these older books. The other thing is always to be mindful that there’s always going to be more to read that you’re never going to be able to read all of it that you’re never going to be able to sort of consume all of the content in the world, but what you do consume, be mindful about it. Be choiceful, use aggregators when you can, and then also never forget to be hands-on, as a product person in particular, especially a product person for a machine learning API. You need to be programming all the time or as much as you can in order to understand the friction and the frustrations of your users.
Den: I want to say a massive shout out to whoever manages Arxiv because… I think it’s Cornell, right? It’s Cornell University. That… I use that as a hobby weekend reading. Or you know, going on the weekend and say “Twitter sentiment analysis” and then you get all these papers that just pop up where somebody has already done the work for you and it’s just a matter of implementing. It’s a wonderful, wonderful resource.
Paige: Yep, I agree an if you haven’t had a chance, Kaggle has also… Kaggle also has a number of its kernels and notebooks available from people all over the world, doing excellent work submitting their models for public competitions, and then all of those notebooks are available as resources, fully documented for people to review afterwards. So if you ever have… And you know, that, Stack Overflow, GitHub, holy moly GitHub like all of these content sources, where if you want to understand how to use pandas to do a thing or use scikit-learn to do a thing, somebody’s probably already done it so you can just go out and find it. It’s wonderful.
Den: I feel like at some point we need to do a reunion episode with our good friend Meg Risdal, who we had a couple episodes back. She’s a PM on Kaggle and we can just all nerd out on reusable kernels.
Courtny: Yeah, the three of you, I’ll just sit that one out. It’ll be over my head, but yeah. You know, it seems like AI, just like any other specialty, you’re going to have to put in the 10,000 hours to become a master. Really dedicate yourself to wanting to learn and wanting to be curious. How important was it that you have a background in AI to be successful in the field? How often have you bumped into people that have went a nontraditional route and you’re working with them? Or you know, what would you have to say about that, traditional versus non traditional education? Lay it out there for us.
Paige: Awesome. So I am a huge believer in “Put your energy and your confidence in people, not in acronyms.” I don’t think that academic background has any sort of indication on how curious, or creative, or talented any person is. And, you know, having a computer science PhD does not mean that you were any better at asking questions, or you know, being empirical than any other human in the world. And actually computer science degrees are often, or at least I’ve found less influential than coming from more of a science or even even a sort of a cognitive science or an English background, to be honest. And you know, the magic often happens in the margins. It’s people who are coming with this domain knowledge and have deep insights into whatever they are specialists at, and then are just using Python or machine learning techniques as a tool to help better understand their customers or better approach problems ‘cause the entire space of applied machine learning is just amazing. We’ve only started touching on it. I’m not sure if you guys are are familiar with distill.pub, which is a sort of a journal for machine learning, but it really goes into deep insights into how you would explain machine learning to someone who’s not necessarily a computer scientist or has the AI background. And it was started by someone named Chris Olah, in addition to many other colleagues. But Chris, I believe, was hired by Google Brain. Just out of high school, so maybe went to college for a little bit, but certainly didn’t have an undergrad degree and is one of the most delightful sort of machine learning authors that I have ever experienced in my entire career. He has this way of simply capturing these very complex ideas and to really straightforward visualizations and terms. But I thought this was a very long winded way of saying - machine learning is kind of magic in that, you know, just as math is ubiquitous no matter where you go, no matter which field. Statistics is ubiquitous. So is machine learning - it can be applied to anything. And like I truly think that if you’re interested in… If you’re interested in solving interesting problems, it doesn’t matter what your background is. There are tools that can help you do that with machine learning.
Den: Now I’m actually curious because when I started digging into machine learning, it was very intimidating because everything is so math-heavy, and I love math. Don’t get me wrong, it’s great. Linear algebra - absolutely my thing. Statistics - absolutely my thing, but it’s a lot to just drink from a firehose when you just get started. For folks that are just breaking into the field, what would you recommend they do to kind of overcome this initial stage of “Oh my gosh, this is too much.”
Paige: So I agree very… I agree wholeheartedly that the math can look intimidating. And one of my favorite anecdotes about this is, you know, mathematical notation is kind of a gatekeeper, I think. And many of the most impactful mathematicians and physicists did not even adopt mathematical notation as it exists today. They came up with their own notation because the important thing isn’t really understanding how to manipulate these variable symbols. It’s just like understanding relationships between things, and so I think a lot of the time the math is needlessly heavy that people are really too eager to put complicated-looking equations on slides or in papers. And I disagree with that approach, and so things that are helpful to kind of combat, you know, and to show that it’s not really as complex of math as you would think is this distill.pub that I mentioned before, and there’s another book that is just delightful called Hands-on Machine Learning with Scikit-Learn and TensorFlow, and this is from a man named or Aurélien, who is phenomenal. He is currently living in New Zealand and making all of us jealous. But this book, the latest version and the earlier version are very great at explaining machine learning and machine learning concepts in simplistic terms and not going too heavy on the mathematical notation. ‘Cause really, especially reinforcement learning, it’s just very simple ideas. You have an agent that acts on an environment so you have an environment that’s created with a whole bunch of rules. That agent acts on the environment. If something good happens, then the agent is rewarded so it does more of that, and if something bad happens then the agent is kind of punished and remembers “Oh, I shouldn’t touch that thing again,” and then all of that is saved in a learning step. So if you act and you learn over enough iterations, then eventually you learn how to interact with an environment, and that’s… But all of that is very simple, right? There’s nothing fancy about it, it’s just the way to go about understanding.
Den: I think there’s some interesting tangent to that. I was reading a book recently on PyTorch (Deep Learning with PyTorch), and I blank on the name, but I’ll put it in the show notes but it was one of those books that… Exactly like you described, where it goes in-depth about these concepts around deep learning. Here’s how to apply it to cancer imagery and how to detect all the tumors in X-Rays, and it just explains it so well without using any mathematical notation - just these stick figure drawings, and to me this is like “Wow, this is talent.” When you can explain something so complex in terms that if I read this book I have zero background and I just get it. I think it’s an underappreciated talent of people teaching stuff the right way.
Paige: Yep, and it also requires humility, right? I think that there’s… And I don’t really have data to back this up, so this is just kind of a personal opinion, but it often feels like in academia in particular, there’s this devotion to very specific words that are… Or very specific terminology, which is often a gatekeeper. And if anything, science is all about communication, and it doesn’t matter if you create the best tool in the world if you can’t explain to people why it’s useful and why they should go about considering using it, or what a particular problem is, and trying to inspire them to an empower them and to make them feel like they can do meaningful research and work in this space. Anyhow, I think that the world could use a great deal more humility, especially from experts.
Den: So in that regard, and if we talk about numbers, I have a question about metrics, and I know that one of your principles that you’ve outlined in your website is “Bring data to opinion fights” which I absolutely love. I think this is the right approach. How do you get other people to care about metrics because it’s easy to say that everyone should care about metrics. We can define whatever you want. How do you get everyone else to care?
Paige: Awesome, that’s a great question, and it also… I think it has to be kind of, you know, hammered in from project launch, especially for products. And it’s so easy to get attached to the wrong metric, right? Focusing on revenue rather than customer retention or focusing on pip install counts or GitHub stars instead of, well, how are these people actually using it? Are the customer segments that we care about actually using our product or is it, you know, just random containers downloading a thing? And I think to get people to buy into metrics, they have to be stakeholders and it has to be a team discussion. It can’t just be a PM showing up and saying like “Well I think we should do this.” It has to be a “OK, well let’s think of what our actual objective is. What would we really enjoy? How can we have the most impact? What do we want to have happen, alright? Now let’s all talk about and discuss how might we go about sort of testing a hypothesis about this thing that we want to have happen, and once that’s done, how do we define the metrics to determine whether we’re successful?” And if you bring people along on that journey and you use it as kind of like a storing telling exercise, it’s a little bit easier to get people to buy into metrics, but usually I’ve found that the most impactful metrics are really these sort of combinations of insights that help you better understand customers as opposed to just picking money, or speed, or whatever it happens to be.
Courtny: Yeah, on that note, I wanted to share a little bit of my story working with that because I spend a lot of time in the qualitative realm. I’m talking to users or I’m going to be doing a lot of interviews, I’m going to be watching people use a product or just kind of getting sentiment and collecting anecdotes and verbatims, and things like that, that allow us to better understand the way people’s minds are working. It’s not so much about having a raw number. A raw number’s great, and there’s certainly times that we try to balance out our qualitative with that, but one thing that has really bothered me in the past, I work with some product managers, and they are really quant-based and they want to have really hard numbers that go in the opposite direction of maybe what we’ve heard from a qualitative session. And your gut and my instinct, it gets a little bit triggered, right? Like it’s… Yeah, of course. The number says that, but I don’t know that that’s entirely the right thing that we’re monitoring, we need to be monitoring or measuring, or using as a benchmark to make a decision against. It’s like we’re watching the wrong thing, and we’re not optimizing for the thing that matters the most, which is the human brain or the way in which they want to use the product. You’re missing it - the number doesn’t tell you stuff. It doesn’t show you everything
Courtny: So yeah, how do you balance that? Have you experienced that yourself?
Paige: Oh absolutely, and I love that you said this - the marriage of qualitative and quantitative, you can’t choose one and then ignore the other. One of my favorite papers, and I will send this to both of you, it was authored by a professor at University of Washington named Amy J. Ko, and what she did was she sat with 17 Microsoft engineers, so she was physically present in a room with them for 90-minute segments, and was hand documenting whatever they were doing throughout their day. So like you know, ten minutes spent programming, five minutes spent doing bug triage, and then two minutes spent reproducing the issue, and then five minutes more on bug triage all throughout the day, and they had these beautiful graphics displaying what the fragmented nature of work looks like for software engineers, for these 17 people. And they had additional data, such as years of experience and specific kinds of teams that these folks were on. But what was fascinating to me was that they coupled this observation study with a survey, where the users were asked how do you seek information as you are going through tasks during your work day, as you’re programming, as you’re doing bug triage etc. What are the things that you use as resources the most and people would report back with “Oh yes, well I look at the metadata for CLs that are… And I looked at, you know, the internal documentation and all the rest” when the reality was… And this was, you know, only evidenced through observations, was that they would yell across the hall to their colleagues: “Hey man, why did you name it this way?” Or like “Where is that thing located again?” Or you know, “I don’t understand this behavior. Please help me out,” and in the survey they had all of the people uniformly had said “Oh, you know, like sometimes I ask my colleagues questions, but really, that’s like the fifth thing that I do maybe.” And so I really think that if you rely fully on surveys, or you rely fully on telemetry, you miss a lot of the insight that you can only capture through user experience work and through doing these sort of observational studies for users and how they’re actually approaching your products.
Courtny: Well, as you know, even the way you ask a question or the way you observe someone can dictate the result that you receive back. So yeah, you know that qualitative side is even an art in and of its own right - trying to figure out how to write questions properly. And you know, not leading people. And yeah, and people get that, what do they call that, there’s a term for it, but they perform when they know they’re being watched, so yeah, great stuff to hear.
Den: It’s interesting to also see how important it is to look at the right data, because even things like surveys or qualitative research can be flawed, just like data can be flawed, if you are not following specific parameters for what Courtny called out - asking the right questions, or asking a lot of follow up questions because it’s very easy to take the surface-level insight and go to your team and say “Well customer said X is the truth, so X must be the truth.” Instead of asking “Well, why or what else is out there?” In that regard, I have again kind of tangential question we just talked about, but how do you go about making sure that you’re looking at the right things? Not just that you’re actually looking at data, both qualitative and quantitative, but that it’s actually the right data.
Courtny: So Den might dislike me for doing this, but I’m going to go off the rails a little bit and I want to know what your thoughts on the recent stock market craziness has been, and why can’t you tell me how to use AI to more effectively bid on the stock market?
Paige: Man, if I was truly a machine learning engineer, I would have been purchasing GameStop stock back in the day, but the…
Courtny: What about crypto, like DogeCoin? Knowing OK, what’s going to take off? How can AI modeling be applied to that type of stuff? I’m really curious.
Paige: Or COVID, right? How could any of the modeling… None of the models that I have ever seen of predicted COVID or anything similar happening so quickly. It’s just, there are some things that are just so strange and so wacky that it’s very difficult for AI to even guess, and more importantly, traditional machine learning is trained all on historic data, right? Which means that you can’t create new futures. You can only predict what you’ve seen before. And so I do think that’s one flaw that we all need to be mindful of, is that there are other worlds that exist, and other possibilities, and many of them, you know, some of them worse, but many of them much better, and machine learning can’t give us that, at least not traditional methods. But it’s just something to remember.
Courtny: So they do fantasy prediction models, you know. Like let’s generate a past, let’s train on a past that didn’t exist, if one or two things changed. Let’s train our AI on that. Is that a super common way to model? Help me understand that. How does AI model, right? And just like you mentioned that kind of goes on past events but, is there kind of, “Oh let’s cherry pick and change a couple of variables and just see how it runs.”
Paige: So that’s a great question and it is a way to test kind of the robustness of models, and also see where models could potentially have flaws. But an example, one of my favorite examples of sort of the way a traditional supervised model would make a decision is that you might have, you know, a massive database from a bank with, you know, many different variables about a given person, so you know where that person happens to live, their academic background, their name, their age, their gender maybe, and then how much of a loan they were given, and whether or not they defaulted on it. So whether or not they paid it back on time, and if you’re the manager of a bank, you care very much about whether they paid back their loan on time or not. And you might sort of be interested in using all of this historic information to say “OK, well if I meet a new person that has these criteria, what is the likelihood that they would pay me back my loan on time?” And so you can analyze sort of that back history, you can sort of test your model or test it by adding in some maybe flawed examples, and sort of understand if it… Does this model hold up, if I move outside of Texas? Does it hold up if I start considering people outside of the United States? And then also, what is the impact for specific demographics, right? There’s these models that were being used by banks for many many years without realizing that ZIP code is very much tied to socioeconomic status and race, and if you sort of look back and and sort of consider some of the recommendations that the model was making, they were saying often that white men should be given loans and then women or people from other other races would be less likely to pay them back on time, which is completely… That’s a whole another hour-long conversation. But ethics and AI, again, I think that’s one of the most important reasons why we need to have people from diverse perspectives, diverse backgrounds all coming together to have these conversations about data and about accuracy, and about whether or not we’re measuring the right things. Because if you only ask one kind of person, then you’re only going to get one kind of answer, and it’s probably not going to be the right one.
Den: Biased datasets produce biased results.
Den: And it’s very easy… And I love the point that you made about these hidden biases where it’s not necessarily where there’s a flag that says what ethnicity a person is from, but just by ZIP code, right? Somebody looks at the surface and say “I didn’t even take this data in the account, it’s just a ZIP code.” But the ZIP code alone can say so much.
Paige: Exactly. And it’s also… Every data collection method is biased. Every single one, and it’s just a matter of how pervasive that bias is through the data set. Even there’s a data collection project within Google that sort of encourages the crowd sourcing of many different image types. So an example of this would be if you think of marriage in the United States and pictures of marriage, you’re probably thinking of white dresses and tall cakes or whatever. And I know that concept and photos looks very different if you’re in China, or if you’re in India, right? And so the motivation of this project was to encourage people from all over the world to take photos and then to tag them with specific terms so that they could be used for image recognition projects. But that is also biased in the sense that people have to have phones, and if you have a phone like an iPhone or whatever it is, you’re probably coming from a very specific socioeconomic status, which means that many other people from other socioeconomic statuses won’t have their experiences captured as part of this model. So biased data collection mechanisms like… I am just so fascinated and impressed by the people who do work in this space because I would have never thought about it. And then after reading a couple of papers, it just makes you realize “How can I be sure of anything?” going down the road, so.
Den: It’s interesting too because you called out the fact that there is the human interaction in AI, and I think this is another point that is so often missed. Is that AI - it’s not just throwing data at a problem and the machine will figure it out, it’ll just come up with the right answer. There’s humans involved.
Den: And humans are biased by default, no matter where you are and if a human is involved in the problem, then you end up with, well, we see every once in awhile these kinds of occasional results.
Paige: Absolutely, and there’s a, you know… Also working in this space and thinking about ethics in AI quite a bit really over the course of the last couple of years, it makes you start to be more mindful about human processes and systems and start questioning them a little bit more as well, because just as algorithms can make these really, really terrible assessments, you start thinking about, well, what about judicial systems? What about all of these things that when you were younger you kind of just took for granted as being, you know, correct and obvious? Like, “Obviously, they’re right,” and then you start thinking about, “Well, no, wait, there all these other cues and there…” And what’s worse is that these sort of human algorithms for decision-making are often even less understandable than the machine learning models that are making their choices. So I don’t know, if anything, machine learning has made me more confident that we need to have people constantly questioning things that we believe to be true, or things that we believe to be right. And you know, I feel a lot more appreciation for humanity after working with computers than I would have ever expected growing up. Computers are magic, but the people parts are the hardest part.
Courtny: So one thing that you are involved in is contributing to open source projects and for someone that wants to follow in your footsteps, I know it can be intimidating sometimes to just get started contributing - you don’t know what to do and you feel it’s likely you know you’re the new person there and you don’t really know what to pick out or work on or help with. What would be your advice to getting started, getting yourself out there when it comes to open source, if you want to provide your perspective and you want to get involved and add some diversity to a project?
Paige: So that’s an excellent question. Open source can use every single type of skill set, not just programming. So if you’re a designer and you want to create a logo, if you’re a tech writer and you want to create documentation, if you’re, you know, a content person, or if you’re really good at SEO or social media, those skills are used and very much needed throughout the open source community as well. So no matter who you are, your unique skill sets can be useful to an open source project, so don’t feel like you have to be making programming contributions or any of those other things, and if anything, I guarantee you, the open source maintainers will just be delighted to have somebody that isn’t necessarily asking about pull requests that can contribute in these ways that they probably don’t have bandwidth or the skills required to help themselves. Certainly get involved however you feel most comfortable and then also, if you’re interested in programming contributions, Google Summer of Code is a great way to get started and to get paid. Outreachy is also a way to have an open source mentor if you’re from an underrepresented group, sort of guide you through an open source project and making your first contributions. And then also GitHub has a tag called
good-first-issue which is usually an indication that this is a relatively straightforward code contribution, and also the maintainers have the mental cue that “OK, this person is taking on this issue. That means they’re probably a beginner. I can sort of mentor and guide them a little bit more than I normally would for pull requests.”
Courtny: So it’s like the shallow end of the pool. It’s the kiddie pool, right?
Courtny: Get your feet dipped in and get a little warmed up.
Paige: And tech like documentation - tech writing. I have so much respect for technical writers. It is one of the most challenging skill sets to develop, I think, and also UX research to be honest. It’s because you have to kind of understand the technical space, but you also have to have user empathy. You have to understand a little bit about psychology, and for tech writing a little bit about effective communication and how you would structure sentences, and all of these things, and every open source project that I have ever seen can use more or corrected technical documentation. And every open source survey over the course of the last half decade - the primary complaint from every user is like “Man, technical documentation. Sure do wish that was better.” And Den has extensive experience in this space and can speak to it much better than I can, but technical documentation is a great way to get started with open source as well.
Den: Everyone wants to talk about technical documentation. Nobody wants to do it. So I’m glad that there is somebody like you that’s advocating for it too. I am all for technical docs.
Courtny: So what would be, as we kind of wrap up here, if somebody was to carry away a piece of career advice from you, what would you say?
Paige: Oh man, I am not the right person to be asking for career advice. So fun fact: when I was in the Earth Sciences, I got told that being interested in Python and being interested in computer science were wastes of time, like nobody would ever need those skills. Nobody in Earth Science would ever care about or respect them, and it was actively harmful for my career too to be interested in and doing work in this space. Hadley Wickham, who is who is Chief Scientist for RStudio, he’s the author of many Tidyverse packages in R stats community, I believe he had a similar situation where in the academic world building developer tools is not respected as much as publishing papers and results, and getting citations from those more traditional means. And so people would tell him, like you know, it’s great that you’re building these, these R tools, but that’s not going to be useful for your career and is actively harmful for it. And so my approach towards that feedback was effectively like “That’s great that you have that opinion, but no thank you. I’m going to continue doing the things that I enjoy and hope it works out in the end,” and so that is that is the approach that I’ve been following my entire career is I’m going to do what I enjoy and what I find to be useful and where I derive meaning. And I’m going to hope it works out in the end, and so far it has. But that is, you know, maybe not the best advice to be giving to the younger generation, but if you do really enjoy a space or a series of spaces, go after whatever your passion happens to be and you know there is a way to make it work and and also there is, I think I said this before, there is magic in the margins, right? You don’t have to follow this formulaic recipe of a career path in order to be successful or to have a meaningful work place or work challenge. You can do all sorts of things and just be creative.
Den: I do have one last question, so for people that want to follow you on Twitter on any other channel. Because I know Twitter, because that’s where I hang out and know that you hang out a lot and everyone should follow Paige on Twitter, period. But where can people find you if they want to learn more about what you do and your adventures?
Paige: Awesome, so I am
@DynamicWebPaige pretty much everywhere on the Internet: Twitter, GitHub, etc. Please feel free to always ping me on Twitter or reach out via GitHub, and my email address is also available there, but I would be delighted to hear from folks, and thank you so much for having me today. This has been really, really fun and a great discussion, great conversation.
Den: It’s always extremely nice talking to you, Paige, and I come from a background where AI was not my thing that I originally even set foot on. So I feel like today was again, a firehose of just like learning and learning. And now I just want to go to Arxiv and start browsing even for more papers. So thank you for the inspiration and thank you for your time.
Courtny: Yeah. Thank you so much for coming on and explaining things to me in a way that I could understand.
Paige: Excellent thank you. Thank you for asking questions. They were great ones.
Den: Thank you Paige.