The Challenges of Presenting Pandemic Data
Understanding the presentation pitfalls of data visualization is essential for decision-making in a pandemic.
A pandemic demands that leaders make informed judgments about when to close, reopen, and, when necessary, reclose struggling economies. Managers must grapple with decisions such as whether they should bring workers back onsite, resume business travel, and welcome the return of retail shoppers. Although the success of any policy ultimately hinges upon whether individuals adhere to these protocols, the potential toll of a virus on employees, customers, and businesses means that accurate forecasting is essential.
Forecasting guides planning, and forecasts rely on data. Modern pandemic data is inherently a time series of points that represent, say, the unfolding number of cases over time. This means that when time is presented in a graph, it must always sit squarely on the x-axis, but there is leeway in deciding what variable runs up the y-axis. Could the takeaways from the data differ depending on this subtle framing choice?
Get Updates on Leading With AI and Data
Get monthly insights on how artificial intelligence impacts your organization and what it means for your company and customers.
Please enter a valid email address
Thank you for signing up
We offer three reasons why it might. First, pandemic data can take different presentation formats that steer viewers toward different perspectives. Second, this data is plagued by time lags, which decouples people’s actions from observable consequences. And third, it exhibits exponential growth, a concept that many people in the general population struggle to understand or mistake for linear growth. Understanding this perfect storm of data confusability can help guide decisions about how communicators should present pandemic data and how audiences should consume it.
Communicating Accumulation in Data Visualizations
In early 2020, the total number of U.S. COVID-19 cases kept rising at an increasing rate. Any number crunching told the same story: Things deteriorated daily.
But by May, the rate for new infections began to wane. Figures like the following — which shows the number of new U.S. COVID-19 cases per day from late January 2020 through May 2020 — provided a basis for hope about the coming summer months.
At first glance, our attention fixates on the downward trend, and having fewer new cases is surely better than having more. This visualization displays the data as a flow: the number of new cases per day. Wittingly or not, this presentation style distracts from the tens of thousands of new cases being added to the grand total each day.
The same data can instead be presented as a stock: the number of cumulative cases, as tallied on any particular day. Presented as a cumulative stock, the waning new case count across the same time period is less obvious — and the severity of the total count of infections is much more obvious.
Stock formatting more clearly communicates what flow obfuscates: The total number of cases is still going up. People looking at a figure like this had reason to conclude that the situation was still quite bad heading into June and July 2020.
Though it rarely occurs in the wild, showing these two graphs side by side makes it clear that people could interpret the same data in opposite ways based on the form it takes. We tested this exact proposition in a series of experiments. When the salient trends differed between the two graphs, judgments about the situation diverged depending on which presentation people saw. A falling flow or a rising stock both led people to expect that the given pattern would stay its course; the good would get better, they predicted, and the bad would get worse.
At the beginning of each new wave in a pandemic, flow and stock rise in lockstep, but when new cases eventually dip, flow and stock formatting again diverge. Our more recent research suggests that seeing U.S. COVID-19 data from May as a flow (as in the first figure) causes people to infer that the situation is less risky — and to report a higher willingness to dine indoors and gather with friends — than seeing the data as a stock (as in the second figure).
Communicators, accordingly, must take caution in deciding which framing to use. Whether public reaction reflects a false sense of security or a maintained vigilance hangs on how numbers are presented. Such presentations ought to reflect the outcome of a reasoned decision rather than a default data presentation.
Accounting For Time Lags in the Data
On March 25, 2020, Colorado Gov. Jared Polis issued a statewide stay-at-home order in an effort to combat the spread of the new coronavirus; it lasted just over a month, until April 26. The following two graphs illustrate what happened with case counts from early March, leading up to the stay-at-home order, through late April (as stock and as flow diagrams, with a line marker to note when the stay-at-home order went into effect).
New cases continued to climb even though Colorado residents (one of the authors included) hunkered down. What happened? Because of the incubation period for COVID-19, it can take as many as 14 days after exposure for people to test positive or show symptoms. Many of the cases documented during lockdown had likely been transmitted before it started.
This data from Colorado illustrates how effect doesn’t always follow cause in rapid succession. But it is very common for people to expect it to. Take MIT’s beer game. In it, players who can’t tell a lager from an ale take on different roles in a distribution channel to bring beer from brewery to drinker as efficiently as possible. Some are manufacturers, others wholesalers, and others retailers — all tethered to one another but unable to control anything beyond their own behavior.
Mistakes occur frequently because players underestimate the latency between action and consequence. It simply takes time to brew beer and move boxes. If you ramp up production only when demand spikes, a glut of goods will arrive only after it’s too late.
A pandemic swaps out overstocks for a public health catastrophe resulting from insensitivity to time lags. Any given week’s new positive tests have already been brewing for a while — meaning that dips in documented cases around major holidays have less to do with preternatural improvements and more to do with predictable lulls in testing. This demands patience in evaluating the effects of easing restrictions. The lack of a sudden spike in cases does not preclude the possibility of an eventual tsunami. A causal relationship sometimes only appears if we give it a minute — or, in the case of COVID-19, a fortnight.
Understanding Exponential Growth
In late 2020, new and cumulative cases were again increasing exponentially. Data (represented as a flow) during October and early November is shown in the figure below.
From this figure, how high might a reasonable person have expected new cases to climb in the month of November — to 175,000 daily? The eye tends to find linear relationships, projecting straight trendlines and insufficiently adjusting for exponential effects. Plotting the same data in logarithmic scale linearizes the exponential relationship, thereby speaking the language of how real people read graphs.
The odds of approaching 200,000 new daily cases by the end of November seems far greater. Failure to appreciate exponential growth, in this context, can truly cost lives. Take the case of hospital beds — a fixed supply. If it takes 10 days to go from 25% to 50% full, in 10 more days the hospital is likely to be out of space.
What to Do With Data?
Dealing with pandemic data — and indeed any time-series data — requires careful thought. Those tasked with deciding how to present data should align the framing with the communication goal. When trying to encourage the public to take greater precautions, such as wearing masks or avoiding travel, depicting data as a declining flow might be counterproductive.
In order to help people understand delayed causal relationships, such as that between exposure and symptoms, provide the context for the audience by emphasizing the lag with a visual reference. Consider a simple but explicit annotation to show when in the future the end result of some action might plausibly be expected to materialize; only thereafter should the efficacy of interventions warrant evaluation.
Like with flow and stock, when presenting visualizations one might consider whether to portray data on a linear scale (see “New US COVID-19 Cases in Fall 2020 [Linear Scale]”) or a condensed scale (see “New US COVID-19 Cases in Fall 2020 [Logarithmic Scale]”) to make different patterns stand out. Alternatively, like with time lags, annotations can bring the message into focus for exponential growth. Even on a linear scale, which downplays exponential patterns, a trend line can denote the projected exponential path, which proved to be a reliable forecast in the case of new U.S. cases in November (see “Presenting Exponential Data With Annotations”).
More than a year into this global crisis, we’ve learned the power that data visualizations have in shaping public opinion about the coronavirus. For those with key roles of influence and authority in the pandemic, such as policy makers, businesses, and the media, it’s important to avoid confusability when presenting data and making data-based decisions. On the audience side, people need to be aware of differences in data framing. Remember to make use of toggle buttons and features that may reconfigure the presentation of facts and figures, avoid jumping to premature conclusions from lagged data, and get ahead of exponential growth before the train leaves the station.