Ethan Zuckerman

Deceptive by Design: Data Visualization and The Ethics of Representation

Published Originally by Rubin Jones
The Public Interest Technologist

The world is awash with increasing amounts of data, and we must keep afloat with our relatively constant perceptual and cognitive abilities. Visualization provides one means of combating information overload, as a well-designed visual encoding can supplant cognitive calculations with simpler perceptual inferences and improve comprehension, memory, and decision making. Moreover, visual representations may help engage more diverse audiences in the process of analytic thinking.

-- Interactive Data Visualization and Society, MIT. Spring 2024

Data visualization blends the creativity of design with the rigor of data management, conveying credibility and expertise in a succinct, two-dimensional image. Now more than ever, data visualizations structure our daily lives, as we interact with charts, dashboards and diagrams in everyday discourse. New online features have accelerated this trend, as interactive tools engage audiovisual techniques previously impossible offline. However, despite their tremendous communicative power, data are not neutral and design has no “correct” answer. All data visualizations are biased, as they reflect perspectives and narratives according to their designers. Not even raw data are perfect; collection is often incomplete, incorrect, and sometimes dishonest. 

Coupled with deep social division and political polarization, data visualizations have transformed a simple breach of trust into a weapon of authority. An acute example came at the height of the pandemic, as Covid-19 dashboards sprawled across the internet. Each designer claimed a different, “real” look into the virus’s impact on public health. Attaching “truth” to their visualizations, many designers were really just projecting an individual slant. Deceptive design has been around for a long time—much longer than Covid-19—but the pandemic catalyzed an era of contortion unlike before. Users got their hands into the data, wrangled it themselves, and circulated it around online forums across the world. The surge in tactical visualization has only gained momentum, as today’s data visualizations saturate our screens and environments, grabbing our attention with eye-catching, “data-backed evidence”. Making sense of what to read or ignore — let alone believe — has become a practice of rhetoric, theory, and politics.

Interactive Data Visualization and Society, a course co-taught by MIT faculty in the Departments of Urban Studies & Planning and Electrical Engineering & Computer Science, investigates these themes. One assignment specifically grapples with ethical design, as it asks students to create two visualizations based on the same dataset: one is meant to persuade the reader that a proposition based on the data is true, while the other is meant to persuade the reader that the proposition is not true. As seen below, the project proved that there is not always a clear distinction between the two categories.

My proposition asked whether abortion access was equal across the United States. I used state abortion data from the Guttmacher Institute.* Figure 1 claimed to prove that access was relatively equal. Figure 2 sought to prove the disparity in abortion access across the country.

It is tempting to think of data and visualization as a neutral actor, with a single “correct” set of design choices that “truthfully” report the data. However, outside of egregious errors (e.g., when dates are sorted incorrectly or the y-axis is not scaled uniformly), we see that “ground truth” in data is much more contextual and situated. Design choices we make give visualization a rhetorical power that influences what a reader concludes and remembers about the data, and blurs the line between persuasion and deception. For instance, contrast Simon Scarr’s Iraq’s Bloody Toll with a more conventional representation of the same data, and consider why Scarr’s visualization won an award while another visualization that made similar design choices—Gun deaths in Florida by Christine Chan—was widely considered to be misleading.

-- Interactive Data Visualization and Society, MIT. Spring 2024

Figure 1. Care Within Reach: Mapping Abortions in the United States and the Availability of Care in 2020

Figure 2. Left in the Dark: Mapping Abortions in the United States and the Unavailability of Care in 2020.

Deciding which data to visualize was a subjective judgment of what mattered, what didn’t matter, and whether an issue even existed to begin with. Once I had established my propositions and decided what the data needed to prove, the path forward was clear — especially for Figure 2. Here, not many transformations were necessary to advance my proposition, since the data provided abundant examples to reference (i.e. that access to an abortion is unevenly distributed across the United States). I was more concerned with which design decisions reflected the gravity of the data most persuasively. For example, I designed Figure 2 with a black-white color palette to emphasize feelings of absence, nothingness, presence, and abundance. By striking a clear visual binary, readers confront the harsh reality that many segments of the country are isolated from abortion care. I also manipulated the color ramp so that states with little to no access to an abortion clinic would be blacked out, while states with higher percentages would become more white—something like a visual beacon of hope in an otherwise bleak landscape. The title plays on this visual cue, invoking the visceral fear of being “Left in the Dark.” These two strategies imbued the data with anxiety and fear, reflecting the lived reality for those who seek an abortion in the United States.

Alternatively, Figure 1 was very difficult to construct since it refuted both law and fact. Transforming, binning, grouping, filtering, and cleaning the data to support a point of view with no bearing in reality proved that some data are simply unequivocal—that some visualizations must resort to unethical omissions or severe calculations to satisfy their dishonest conclusions. Grouping states into regions was the most deceptive strategy used for this visualization. The “ground truth” showed that some states have extremely low numbers of abortion clinics, such as Missouri, while other states, like Michigan, have an above-average number of abortion clinics. Clustering these states into regions blurred those differences into a general average; however, it was not easy to construct these combinations. In fact, most of the groupings I created proved my counterargument. I ultimately had to shuffle outlier states between regions until I established a reasonable average across the four regions. But this raised yet another issue: how many regions should I create to reach a consistent average, while still maintaining credibility? 

One subtle, albeit effective, design strategy I used to support my argument was the erasure of state borders. By removing these boundaries, I managed to obscure otherwise questionable regions. For example, Louisiana is not typically considered a “Midwestern” state, and Ohio is not really part of the Northeast. I tested other groupings—such as census boundaries, time zones, or alternative geographies—but the groups shown above were most effective at advancing my proposition. The final visualization suggests little to no difference in the proportion of U.S. abortions by region of occurrence, arguing that access to an abortion is relatively equal across the country.

Striking a balance between earnest and deceptive techniques was an engaging exercise of data gymnastics. Binning, dividing, grouping, and filtering certain variables altered the “ground truth” of what I had inherited from the Guttmacher Institute, exposing the flexibility of data to bend to one’s will. However, some data transformations backfired. In some instances, I produced a calculation that contradicted my argument, producing patterns inconsistent with my proposition; or the necessary transformations were so blatantly deceptive that the transformation was no longer useful. Figuring out how to work within the boundaries of (dis)honest data visualization quickly became an exercise of trial and error.

Working with (and against) abortion data underscored the importance of ethical design and the need for transparency into data transformation. At the heart of these principles is a moral imperative to honor the lived realities measured by the data, to acknowledge organic patterns without repackaging them as different realities, and to recognize data visualization as an active contributor to real-world outcomes. Designing data with little to no bearing in reality dishonors the lived realities and experiences captured within the dataset, eroding the trust, dialogue, and potential for repair within data visualization projects.

*Abortion Data by U.S. State was provided by the Guttmacher Institute. The Guttmacher Institute is a research and policy organization committed to advancing sexual and reproductive health and rights. They maintain a variety of global data related to global reproductive rights. The data set used for this project includes information about abortion rates, abortion providers, and abortion seekers, aggregated to the state level.


Rubin Jones (he/him) is a Master in City Planning candidate with a concentration in City Design and Development. His work examines the role of history, technology, politics, ecology, and culture in shaping the built environment. Before coming to DUSP, he worked as a city planner in Bella Vista, Arkansas. Rubin received his B.A. in Environmental Studies and Government and Legal Studies from Bowdoin College.