Topics
Artificial Intelligence and Business Strategy
In collaboration with
BCGAndrew Rabinovich began his career in technology working on AI applications for cancer detection. He also spent time at Google, working on early iterations of products like Google Glass. Now at Upwork, as vice president and head of AI and machine learning, Andrew and his team are working to enhance the digital labor platform’s capabilities with AI solutions to enable more sophisticated matching of resources to projects.
On today’s episode of the Me, Myself, and AI podcast, Andrew shares his views on the ways AI could take on more complex projects while using fewer resources. In the way of AI’s rapid progress, however, are slow advancements in hardware. While AI has made huge strides in cognition, he says, hardware struggles to match its capabilities, especially in wearable tech and robotics. Still, Andrew envisions a future with hyper-personalized digital assistants for everyone.
Subscribe to Me, Myself, and AI on Apple Podcasts or Spotify.
Transcript
Shervin Khodabandeh: With AI-infused products and services advancing so rapidly, what one component is holding developments back? Find out on today’s episode.
Andrew Rabinovich: I’m Andrew Rabinovich from Upwork and you’re listening to Me, Myself, and AI.
Sam Ransbotham: Welcome to Me, Myself, and AI, a podcast on artificial intelligence in business. Each episode, we introduce you to someone innovating with AI. I’m Sam Ransbotham, professor of analytics at Boston College. I’m also the AI and business strategy guest editor at MIT Sloan Management Review.
Shervin Khodabandeh: And I’m Shervin Khodabandeh, senior partner with BCG and one of the leaders of our AI business. Together, MIT SMR and BCG have been researching and publishing on AI since 2017, interviewing hundreds of practitioners and surveying thousands of companies on what it takes to build and to deploy and scale AI capabilities, and really transform the way organizations operate.
Sam Ransbotham: Today, Shervin and I are talking with Andrew Rabinovich, vice president and head of AI and machine learning at Upwork. Andrew, thanks for talking with us.
Andrew Rabinovich: Thank you. Good morning. My pleasure to be here.
Sam Ransbotham: So, Andrew, I’ve long been interested in online labor platforms and Upwork in particular, both as a user, I admit, and as a researcher. In case any listeners are looking for some light bedtime reading, I’ve recently published an academic paper, “Using Online Platforms.” And what’s fascinating is that these IT improvements are shrinking transaction costs, and that’s changing [the] structure of firms. I don’t know that The Economist from 1937 would recognize firms these days. But before I digress too much, let’s step back. Andrew, rope me in. Tell people what Upwork is and what the platform does.
Andrew Rabinovich: Sure. So Upwork is the world’s largest bidirectional marketplace — which is an important distinction that we can dive into later — where clients can look for talent across a variety of categories to help them [with] work, and freelancers from around the world can look for work opportunities online — in software, any kind of development that allows them to leverage their skills without necessarily being physically close to where the action is.
Sam Ransbotham: That makes sense. And you’re the vice president and head of AI and machine learning. Tell us about what that role means.
Andrew Rabinovich: Upwork is a company that is a result of a merger of two large marketplaces, oDesk and Elance, [which] started in [the] early 2000s and then in 2015 merged to become Upwork. And, for the longest time, the platform has offered a matching service between clients and freelancers, as I had mentioned.
However, in the evolution of technology that you referred to, we are on the path to transform Upwork. The goal is to distance ourselves away from matching clients to talent. Rather than doing that, [we’ll] offer an outcome-driven business where clients come onto the platform and describe the problem that they want to solve rather than looking for folks to help them solve the particular problem. And in order to facilitate that experience, there’s a tremendous amount of AI and machine learning involved to create an effective and pleasant experience.
Shervin Khodabandeh: So explain that more. I’m a client, I show up, and I say, “I have this technical problem: I’d like to develop a forecasting model for my —”
Sam Ransbotham: Podcast listeners.
Shervin Khodabandeh: Yeah, for my podcast listeners.
Andrew Rabinovich: That’s a great example. Traditionally, say you come onto the platform and you say, “I am looking for a financial analyst to help me improve traction [for] my podcast.” And then you would have to write super technical specifications of the type of a person you’re looking for, and then Upwork’s match engine would find you a list of highly skilled candidates that you can pick from, and then Upwork steps away, and then you transact with that individual.
Shervin Khodabandeh: We’re all familiar with that model, right? So the new one is?
Andrew Rabinovich: There are two problems with this original approach, and they’re not really problems. They’re more limitations. First of all, you have to know what kind of a person you need or a group of people you need to solve the particular task. And you have to be able to describe it in very technical terms such that the search engine on Upwork is able to provide you with the right matches.
Now, suppose you are doing a podcast on travel or mycology or design, and you really aren’t a technical person, and you’re like, “I have some traction. People listen to my podcast, but I want it to explode so I can become an influencer, make a million dollars, whatever.” And that’s your level of technical expertise; this is as far as you can explain it.
You’re super experienced in your domain, but when it comes to unit economics of podcasting you have no clue. So essentially Upwork is useless to you because the barrier to entry into this platform is so high that you can’t use it.
Shervin Khodabandeh: You need technical know-how, and you need to know what you need.
Andrew Rabinovich: Exactly. So the expanded direction that we’re in right now is we’ve launched this AI companion on Upwork’s platform called Uma, which stands for Upwork’s Mindful AI, where you come to the platform and you say exactly what I just mentioned. “I’m a podcaster in domain X, Y, Z. I want my podcast to be better, period.” And then Uma figures out that you need an analyst or you need some technical person to help you expand it, maybe market it elsewhere, maybe get some advertising going, whatever it is that’s necessary. And Uma finds the right people and/or AIs that will help you solve your problem and together [they] will deliver an outcome rather than just finding you the people that you can work with, right? So rather than finding talent that can help you solve the problem, Upwork now solves the problem.
Shervin Khodabandeh: So clearly there is a generative AI component to that. What else is there? For example, are there recommender engines and forecasting models and things like that as well?
Andrew Rabinovich: Absolutely everything, right? So whatever AI technologies exist today and there’s —
Shervin Khodabandeh: Off-the-shelf or proprietary?
Andrew Rabinovich: Off-the-shelf. Think of the freelancers [who] work in the Upwork marketplace or Upwork platform. They’re not Upwork employees, right? They’re freelancers. In the same way, all existing AI tools, apps in the App Store, or components in AWS [Amazon Web Services], they’re also “AI freelancers,” if you will, that should have the right to exist on this platform. And the role of Upwork is to figure out how to combine the human talent with machine talent to deliver the desired outcome.
Sam Ransbotham: I am a business school professor, and I have to think about the ROI here. How does that play out in the future for you?
Andrew Rabinovich: That’s a great question, and there is a nuance. And you know this a lot better than I do, just like with any step function in technological progress — electricity, steam engines, whatever — the technology’s ability is not to replace people but rather to amplify them. If in the past you were able to translate a podcast from English to Japanese for $10,000, tomorrow you’ll be able to take the podcasts and translate them to all languages in the world for the same $10,000.
It’s not that the high-paying jobs are going away; they’re just becoming much, much more interesting. If in the past for $50,000 on Upwork you could build an e-commerce website, in the future for $50,000 you’d be able to build a new Facebook, right? And in fact, this is super promising and makes me optimistic about the future. With essentially the same amount of people, you’ll be able to create much, much more value because the technology that’s becoming available in abundance is only helping you solve larger and larger problems.
Sam Ransbotham: There are two things that makes me think of. Sorry to our Japanese audience, but we’re not spending $10,000 to translate each podcast episode. In some sense, your competition there is not with the $10,000 job, it’s with us not doing that job at all in the first place. And so there’s a big underserved market that has the potential to tap. And that’s in addition to the other point you mentioned, which is, “OK, we’re going to expand and we can do a lot more languages.” There are a lot of things that aren’t happening right now because of that barrier. And I think that’s what seems very promising about this.
Andrew Rabinovich: Absolutely.
Sam Ransbotham: You’ve got a complicated background — people wouldn’t know from talking to you because you haven’t said much super technical, but you’ve got a crazy technical background, great technical depth. Tell us a bit about how you ended up in this position.
Andrew Rabinovich: It goes quite a while back. I started doing AI, as we call it today; in the past, we called it machine learning. In particular, computer vision is where my career began. I got really interested in doing biomedical imaging, and in 2003, I founded [my] first company that was detecting cancer in human tissue. I didn’t know anything about startups. I didn’t know really anything about business. The company got acquired for some not interesting amount of money, but I got really, really fascinated with machine learning, and I thought that was the forward path for me.
And then after I got a Ph.D. in machine learning at the University of California, San Diego, I joined Google, and together with a bunch of colleagues, we released the world’s first [augmented reality] app for both Android and iOS that was called Google Goggles. Today, that’s the lens feature in the Pixel phones. And that was so massively successful that [Google cofounder] Sergey Brin wanted to bring all of that technology to build Google Glass. We built Google Glass, which was not very successful from a product perspective.
Sam Ransbotham: Too early, maybe.
Andrew Rabinovich: It was way too early. And as I continue telling this story, you’ll see it happened again in the future. After the Google Glass experience, I decided that working on hardware is not a good idea and continued working in computer vision, object recognition, and things like that as part of the visual search in Google Brain. As part of that, we developed the most advanced neural network at the time called the Inception, and that powered a large number of products within Google and, in particular, visual search in Google Photos.
After that, I got lured into working on hardware — impossible stuff — again and joined Magic Leap to build the most advanced spatial computer in the world at the time. And I’m very proud of the impact we’ve had there.
But toward the end of my time there, I realized that the hardware, yet again, was not ready. The displays are not what they needed to be, and our ambition of having a device that would essentially replace a phone and a laptop is too big. The hardware is not filling in the right requirements. But the other thing I realized is that the algorithms necessary to facilitate this experience. … If you remember the movie Her, where the main character interacted with the operating system, the requirements to facilitate this all-day, everyday wearable device required that there is this companion that understands everything that you see and everything that you do, how you feel, how you react, and allows you to navigate the physical world.
So I decided that while we wait for the hardware to emerge and be appropriate for the experience, it is possible to build that interactive experience with a companion. So I started a company called Headroom, where the goal was to tap into videoconferencing experiences that everybody was having, similar to the one that we are doing right now, where the AI understands everything that’s being discussed and understands you as a participant from a multimodal perspective.
It understands what you say, it understands how you say it, it knows when you smile, when you frown, when you’re paying attention, when you’re angry, and so on. And then, once these models are advanced enough and ready to be deployed, presumably the hardware for the all-day, everyday wearable devices will be ready. Then you can just take the models from laptops, where you have a webcam and a screen, and translate them into these wearable devices of the future.
And then we started looking at a bunch of different collaborative platforms for work. Those included a company called Lucidspark and Lucidchart, and a bunch of others. Upwork emerged as a very interesting use case because clients and freelancers communicate on the platform all the time, and being able to capture the essence of their interactions was a critical component in order to be able to improve the quality of the matching and the experience from both freelancers and collaborators.
And when we started discussing a partnership, because Headroom developed an SDK that you could plug into any collaborative platform, we thought there was an interesting strategic relationship to be had. That’s how Upwork was interested in acquiring Headroom. Since December of last year, I’ve been at Upwork together with my team in the effort to transform it into this outcome-driven business.
Sam Ransbotham: That’s very different than the hardware and personal interactions you’ve been talking about before then.
Shervin Khodabandeh: I do want to talk about hardware for a second. You said “hardware” three times, and I’m interested in your views on the state of play and the future. In many ways, as you already mentioned, and when you look today at the ability of AI in terms of its cognitive ability, both in terms of understanding multimodal and in terms of responding in multimodal, it far exceeds its hardware counterpart.
In many ways, it is able to do all this because of all the technological innovations that have happened in hardware, but we’re talking [about] different kinds of hardware. We’re talking [about] computer chips and that kind of hardware. But in actual mechanical, physical hardware, what do you see as the biggest blockers, and how do we unblock them?
Andrew Rabinovich: I assume you’re referring to wearable devices in terms of hardware components.
Shervin Khodabandeh: Wearable devices, all kinds of bots and physical devices that can do things, not just necessarily human, you know, augmented devices, but things that can even work autonomously. It feels like the physical limitation far surpasses the mental and intelligent limitations.
Andrew Rabinovich: It does to an extent. If you think of the advances we’ve made in bits — sort of AI for cognition and recognition and translation and things like that — we’ve made great strides, but we still don’t have models that can think and reason, right? These are still pattern-matching machines that answer questions formulated by humans. Machines still can’t ask questions, right?
In terms of hardware, atoms are difficult to manipulate. In wearable devices, the displays are just not there where we need [them] to [be]. If we talk about augmented reality with see-through displays, the efficiency of the projectors that we have are just too low, so the batteries don’t last long enough. If you look at your cellphone on a sunny day, you can’t really see anything on the screen, so you have to cover it with your hand to create a shadow.
Where you have this wearable device, you can’t put a shadow on the whole park that you’re in, so that doesn’t work. Now with robots, one great example of successful robots — from my perspective — are those that are in an extremely confined environment, like surgical robotics. You have these sterile rooms where the patient is always positioned the same or a packaging robot at a warehouse. It does the repetitive thing many, many times, and it works great.
The robots that work in more open environments are things like Waymo cars. I remember when Stanford won the DARPA Grand Challenge in 2005, where a self-driving car drove, I think, from Los Angeles to Las Vegas. We’re like, “Cool, this thing is done.”
Shervin Khodabandeh: It’s funny you say that, by the way. I just literally this morning got cut off by one coming back from my morning coffee. It was actually two of them. This is a digression, but the two of them were more conscientious of not hitting each other.
Andrew Rabinovich: That’s true because they are very well aware of one another, and they know what the other will do. They’ve made great progress and they’re like a thing now. I actually let my kids take them. I’m a little apprehensive of my kids getting into Uber on their own, but I’m totally fine with them taking Waymos around town.
So that’s a great example, but the problem again is that we build these large language models, we build detection recognition classification models, but these are all predictive systems, which don’t have the model of the world. And there’s a lot of discussion about this in academic circles. Some believe that statistically by observing a tremendous amount of the world around us, either from video or from text or from audio, that’s enough to infer the model of the world. I hold a slightly different opinion, where all of the laws of nature that we’ve figured out, we have to ingest those as priors. So there’s some, not that I’m a believer in symbolic AI, but the fact that we understand the laws of gravity —
Sam Ransbotham: Just like basic physics. We understand it fundamentally.
Andrew Rabinovich: Exactly. They have to be inserted as initial conditions into these LLMs. So if you ask a large language model what happens if you drop an apple from the roof of your house, of course it’ll tell you it’ll fall. But it doesn’t do it because it understands what gravity is. It does it because it reads so many texts online that keep saying that if you drop it, it’ll fall, right?
So with robotics, it’s kind of the same thing. If things happen according to the fairly simple model of the environment that you’ve constructed, then it works perfectly, right? It doesn’t get tired. Hands don’t shake. It just does the same thing over and over again with this incredible precision. But if things get a little weird, then it starts to get confused in the most unpredictable way because unlike us humans, these things don’t degrade gracefully. They can just explode, right?
Shervin Khodabandeh: I like what you’re saying here because you’re making me rethink my own question. The premise was the technology hasn’t advanced as much as the software component of AI and generative AI. But you’re basically saying just the same way as there is a hallucination problem with the software, there is a physical hallucination problem with the hardware.
Andrew Rabinovich: Exactly.
Shervin Khodabandeh: So let’s not overthink how real capable the software actually is.
Andrew Rabinovich: It’s absolutely right. If I ask you, would you trust ChatGPT to answer all of your emails for a day? You would not agree to this, right?
Shervin Khodabandeh: Yeah, it’s only for the 5% that I won’t trust it with, right? For 95%, I would agree.
Andrew Rabinovich: Maybe just 1%.
Shervin Khodabandeh: Exactly.
Andrew Rabinovich: Maybe in one email someone says, “Do you want a bonus?” And the thing says, “No.” And you’re like, [this is a problem].
Shervin Khodabandeh: I have never gotten an email like that.
Sam Ransbotham: It’s a training data problem, right?
Andrew Rabinovich: Exactly. These things go hand in hand. From a sort of materials science and physics perspective, the human body can, through evolution, do tremendous things that we haven’t been able to replicate entirely with hardware very well, and it’s still sort of on the horizon, right? And the problem is that we always underestimate how hard the problem is.
At Magic Leap, which was by far the — well, since xAI, I guess — most funded startup without a product that it ever sold. I think in total, while I was there, we raised around $3 billion, and people thought this was an astronomical amount. And we thought that with $3 billion, we can solve this mixed reality problem. Cumulatively, to date, between Magic Leap and Microsoft and Meta and Apple, I think we’re getting close to $100 billion. And we haven’t gotten much further than the $3 billion for Magic Leap.
So the problem is extremely difficult from a hardware perspective, and while we’re chipping away at it fairly slowly from my perspective, we are still far away [from] having something.
I remember one of the projects at Google X that we wanted to do. We wanted to create a programmable contact lens that would get power from the heat of your eye, and then it would project all kinds of interesting things for you like in sci-fi movies. After I think [after] five years of trying, we were able to turn on one pixel in the contact lens, and then the whole thing was abandoned because it was deemed too difficult.
Sam Ransbotham: I think that’s where you’re going in general when you’re talking about cars. It’s how hard all this stuff is to extrapolate.
Andrew Rabinovich: It really is.
Shervin Khodabandeh: What does this mean, in your view, for letting AI out of the lab, out of the confined domain-specific areas like the operating room in a surgical setting or a very specific task? What does that look like for you in 10 years?
Andrew Rabinovich: Giving estimates of the long horizon is very difficult. I’m not as good as [futurist] Ray Kurzweil about this, but in a short amount of time, I believe that generative AI will replace the way humans interact with data. If you think about ChatGPT, it hasn’t invented any new information, right? So if you’re super savvy with Google and other technologies, you can figure out how to write a birthday greeting in a Shakespearean sonnet style, right?
It’ll take you a while and not everybody can do it, but what ChatGPT allows you to do is to interact with the same data in a very, very intuitive format. So it becomes an interface between the human and the information. I believe in the future, search engines of this form or any kind of retrieval systems will go away, and everything will be a dialogue between a human and a machine in any mode. If you have the wearable glasses, it’ll just see what you’re looking at and you can just say, “What’s that thing on the sidewalk?” and it’ll tell you what it is and you can talk about it. If you receive a document that needs to get a response, you tell the AI, “Write this up,” and “Reply to it.”
Shervin Khodabandeh: We’re doing that now.
Andrew Rabinovich: We’re doing this now, but we’re doing this in a very sort of disjointed way. The flows are extremely fragmented. In the future, you’ll just have one window on your screen where your AI companion — every person is going to have their own — is going to be hyper-personalized. It’ll understand everything that you do. Like, ChatGPT doesn’t know anything about you right now, right?
Shervin Khodabandeh: I’d like to transition us now to a short segment we call five questions. I’m going to ask you five questions, rapid-fire. Tell us the first thing that comes to your mind, preferably short answers. What do you see as the biggest opportunity for AI right now?
Andrew Rabinovich: Biology.
Shervin Khodabandeh: What do you think is the biggest misconception about AI?
Andrew Rabinovich: That it can take over the world.
Shervin Khodabandeh: What was the first career you wanted? What did you want to be when you grew up?
Andrew Rabinovich: A physicist.
Shervin Khodabandeh: What kind of physicist? Theoretical?
Andrew Rabinovich: Yes, theoretical physicist. My dad is a very famous physicist. When I was a kid I wanted to be like him, I guess.
Shervin Khodabandeh: When is there too much AI?
Andrew Rabinovich: Never.
Shervin Khodabandeh: I like that. What is the one thing you wish AI could do right now that it can’t?
Andrew Rabinovich: Be honest.
Shervin Khodabandeh: Very good.
Sam Ransbotham: Andrew, it’s been fascinating. I think we could talk for hours here. I’m hungry for more.
Andrew Rabinovich: Yeah, I’m having a great time, too.
Sam Ransbotham: All right, well, maybe we’ll have to drag you back here, but thanks for joining us, and thanks for taking the time today.
Andrew Rabinovich: My pleasure, thank you guys.
Shervin Khodabandeh: Sam, I thought that was a really interesting conversation. Obviously, Andrew has been around the block a few times, and he’s been behind some of the seminal models that are open source around computer vision and reinforcement learning and all that. What struck you in this conversation?
Sam Ransbotham: He has been behind a lot of these models, and I thought his breadth of experience was particularly interesting, because he has built these models, and we can all use these models. Literally, any listener, right now, if you want to go out there, you can download his Inception model. I mean, not just his, a lot of people worked on it. But [you can] download this Inception model off of PyTorch Vision, and use it today.
I think that’s a really interesting point that didn’t come out quite as much as we were hoping it would in the episode. These tools are becoming accessible to people. People can go out and use them. And we talk about democratization — I’m not sure democratization is the right word, because that implies everyone is going to have access. I think a better word is meritocracy, maybe. That sounds a little, sort of, like, do people deserve it? But it’s not going to be that everybody can download them. It’s only going to be the people who are willing to learn about them and take the time to invest in these models. I think that’s really important. Just having this stuff available isn’t enough. That makes it available, but it doesn’t mean that you can use it.
Shervin Khodabandeh: But it also means that more will be expected of people, right? Like what used to be an edge, which is, I work hard and I know a lot, right? I’ve memorized a lot of facts. I’ve read a lot of books, and I work hard. That is still valuable, right? But knowledge alone, maybe that’s the reason you use the word democratization, right? Access to knowledge and information alone, which used to be a significant differentiator among individuals, and among teams, is slowly eroding as a differentiator.
Sam Ransbotham: I guess we have to figure out what is going to be differentiating. One of the things that he mentioned was the idea of tasks that become much smaller and that, you know, the tools are doing more of them. I’m left wondering what parts of these tasks are worth investing in. When he was talking about search engines, I immediately was thinking, “You know, I’m actually pretty good at googling things.” You know, my Google-fu is strong, and, you know, I wonder if that has become not so important any more, then what do I need to be working on next? And I don’t think we got to a clear answer on that today. What is it we need to be working on?
Shervin Khodabandeh: Well, I think maybe part of the answer is itself figuring out the question … Instead of “What are the tasks that we need to be better at?” it’s “What are the problems, and what are the outcomes that now we’re capable of driving? Because, implicitly, we pick that which we think can be done. I mean, at the end of the day, we’re not going to pick things that are impossible or difficult, or we can’t do it alone, right? Or we can’t do it … without hundreds of billions of investment and big teams. I do feel like there’s a selection mechanism that’s going on, that people and organizations and teams pick things they, I think, believe are feasible within a time window. And I think what is possible is changing.
However, to your point earlier around willingness to know and learn, so that you know what can be done differently, I think there is a huge gap there. I think there’s a huge gap in knowledge amongst practitioners, and managers, and team members, about how the art of the possible is changing. And so maybe the real question here is “How big should we dream?” rather than “What tasks should we be doing?”
Sam Ransbotham: Well, Shervin, you’re right on talking about the uncertainty in our world, and the talent uncertainty that’s happening. Maybe we should write a report about that.
Shervin Khodabandeh: Maybe we should do that.
Thanks for listening. Sam and I are in fact working with our teams on a research report about how artificial intelligence and organizational learning capabilities prepare companies for uncertainty.
Look for it later this fall. Next time, Sam and I will meet Jeremy Kahn, AI editor at Fortune. We look forward to speaking with you then.
Allison Ryder: Thanks for listening to Me, Myself, and AI. We believe, like you, that the conversation about AI implementation doesn’t start and stop with this podcast. That’s why we’ve created a group on LinkedIn specifically for listeners like you. It’s called AI for Leaders, and if you join us, you can chat with show creators and hosts, ask your own questions, share your insights, and gain access to valuable resources about AI implementation from MIT SMR and BCG. You can access it by visiting mitsmr.com/AIforLeaders. We’ll put that link in the show notes, and we hope to see you there.