Detecting Bias in Data Analysis
How you handle your data — from cleanup through presentation — affects the results you’ll get.
Topics
Competing With Data & Analytics
Data analysis can be determined as much by external agendas as by math and science. These agendas can come from many sources — personal, political, or technical.
At a personal level, analysts or managers may have vested interests in one outcome over another or may seek justification for prior claims based on intuition; they know the results the analysis should find. Politically, they may be conditioned from past decisions or be wary of the implications of an outcome. Technically, they may know valid limitations of the source data that lead them to discredit the results of whatever analysis is brought to bear.
These outside agendas can be overtly or subtly embedded in the analysis. The result is analysis that favors one outcome over another — perhaps in ways that run counter to organizational objectives.
How can managers better detect and address agendas embedded in analysis?
Through practice.
In a course I teach, my students improve their ability to recognize embedded agendas through “data debates.” Like traditional debates, two groups take opposing positions on a question. Each group prepares by developing arguments to support their position and then presents their argument to the class. Clarification questions and rebuttals follow. (I stop short of voting on a winner to keep competitive spirits from dominating educational spirits and to encourage experimentation.)
Unlike in traditional debates, student arguments must rely exclusively on analysis of a provided dataset. To minimize the possibility of prior opinions, the questions are fictitious: Given production logs, are a company’s hoverboards defective? Given accounting information, is the acquisition price right for a startup? Was marketing for The Dillionaire effective? Each question comes with a generated dataset that is large enough to require analysis beyond desktop spreadsheets, but not unwieldy.
The classroom debate that follows offers several lessons for organizations that must deal with far murkier data and more complex agendas.
Janitorial Work Can Sway
Although the datasets provided for class are simpler than those found in real organizations, they are messy in similar ways. They contain outliers, incompleteness, and other ambiguity. They may require significant janitorial work before analysis begins in earnest.
Comments (2)
Isabella Ghement
Branden Williams