The Machine Learning Race Is Really a Data Race

Organizations that hope to make AI a differentiator need to draw from alternative data sets — ones they may have to create themselves.

Megan Beck and Barry Libert December 14, 2018 Reading Time: 6 min

Topics

Rowing crew of business people race through a sea of data in a four person row boat

Machine learning — or artificial intelligence, if you prefer — is already becoming a commodity. Companies racing to simultaneously define and implement machine learning are finding, to their surprise, that implementing the algorithms used to make machines intelligent about a data set or problem is the easy part. There is a robust cohort of plug-and-play solutions to painlessly accomplish the heavy programmatic lifting, from the open-source machine learning framework of Google’s TensorFlow to Microsoft’s Azure Machine Learning and Amazon’s SageMaker.

What’s not becoming commoditized, though, is data. Instead, data is emerging as the key differentiator in the machine learning race. This is because good data is uncommon.

Useful Data: Both Valuable and Rare

Data is becoming a differentiator because many companies don’t have the data they need. Although companies have measured themselves in systematic ways using generally accepted accounting principles for decades, this measurement has long been focused on physical and financial assets — things and money. A Nobel Prize was even awarded on capital asset pricing in 2013, reinforcing these well-established priorities.

But today’s most valuable companies trade in software and networks, not just physical goods and capital assets. Over the past 40 years, the asset focus has completely flipped, from the market being dominated by 83% tangible assets in 1975 to 84% intangible assets in 2015. Instead of manufacturing coffeepots and selling washing machines, today’s corporate giants offer apps and connect people. This shift has created a drastic mismatch between what we measure and what actually drives value.

The result is that useful data is problematically rare. There is a growing gap between market and book values. Because of this gap, companies are racing to apply machine learning to important business decisions, even replacing some of their expensive consultants, only to realize that the data they need doesn’t even exist yet. In essence, the fancy new AI systems are being asked to apply new techniques to the same old material.

Just like people, a machine learning system is not going to be smart about any topic until it has been taught. Machines need a lot more data than humans do in order to get smart — although, granted, they do read that data a lot faster.

Topics

About the Authors

Megan Beck (@themeganbeck) is cofounder and chief product officer of OpenMatters, a machine learning company. Barry Libert (@barrylibert) is CEO of OpenMatters and a senior fellow at Wharton’s SEI Center.

Tags:

I fully agree on the value proposition for AI in an organization. With many years of my personal experience in Supply Chain Consulting I have come across one thing that has a common thread when it comes to AI. It takes a lot of time building business and process knowledge and the person(s) that can contribute to AI are the ones that bring unique perspectives on several business scenarios that are not typically documented anywhere. For example, building a use case for a 'Make to Stock' when you have always done 'Make to Order'. We are barely scratching the surface of AI and the typical tools as rightly pointed out are 'Only Tools' without the unique insight from key resources in an organization.

Tom Thanks for your comment. We appreciate your feedback. We agree with your comment about intelligentt systems. Our own experience suggests that even with intelligent systems, you need your own proprietary ‘lens’ through which one needs to look to create truly unique IP. If not, you are simply looking at the same data that everyone else is looking at using the same tools. In short. As Marcel Proust said (and i paraphrase) - ‘the journey is not seeing new lands, but in seeing through new eyes’. Given that, we recommend that every firm start by instrumenting themselves - e.g. documenting their own IP/POV and then seeking data that helps either prove or disprove the thesis. What is your experience? Best regards, Barry

Beck and Libert clearly and repetitively state that what is needed is new or different "data" rather than the standard metrics. Yet they keep their hands over those critically new areas and how they are applied, differently and transcend what competitors are also doing. Intelligent AI systems can now scan text and mine these for correlations. This is regularly reported in the public arena as seen with Google, Amazon, Apple, Microsoft and others. Where is this "edge" that the authors push forward as critical and that they are not providing to all their current and future clients? This is a marketing article that should be usefully expanded in the trade press and not be stamped with an MIT imprimatur.

The Machine Learning Race Is Really a Data Race

Organizations that hope to make AI a differentiator need to draw from alternative data sets — ones they may have to create themselves.

Topics

Useful Data: Both Valuable and Rare

Topics

About the Authors

Tags:

Add a comment Cancel reply

Comments (4)

Barry Libert

Mani Iyer

Barry Libert

Tom Abeles

Topics

Useful Data: Both Valuable and Rare

Topics

About the Authors

Tags:

More Like This

Add a comment Cancel reply

Comments (4)

Barry Libert

Mani Iyer

Barry Libert

Tom Abeles