Your independent source for Harvard news since 1898 | SUBSCRIBE

Your independent source for Harvard news since 1898

Right Now | Privacy Predicament

Racial Bias and Redistricting

March-April 2022

A colored illustration of diverse voters in a ballot box from which the color is being drained

Illustration by Robert Neubecker


Illustration by Robert Neubecker

Since late last year, legislators and independent commissions across the United States have been busy redrawing the contours of American democracy. The redistricting process occurs once every decade and is used to create new voting district boundaries that are meant to ensure fair elections by accounting for changing demographics reflected in the national census. But new research from a team of Harvard political scientists and statisticians found that a privacy tool implemented for the first time in the 2020 census has likely introduced a bias into the data that undercounts minority populations and dilutes their power as voters.

When the U.S. Census Bureau conducts its decadal population count, it must balance the accuracy of the data with a mandate to protect the privacy of voters. This is especially important for sensitive personal information such as race and ethnicity, which has been wielded by politicians to reduce the power of minority voting blocs through the redistricting process. For the 2020 census, the census bureau implemented a new statistical technique called “differential privacy” that was intended to strike an ideal compromise between accuracy and privacy. But as the Harvard researchers found, this new effort fell far short of its goal.

“We went in with a very open mind,” says Christopher Kenny, a third-year doctoral student in government. “We had no idea what it was going to show and just wanted to do an analysis. What came out was this kind of shocking bias that would negatively affect more diverse areas.”

Differential privacy, co-invented by McKay professor of computer science Cynthia Dwork, is designed to prevent the identification of individuals in a data set through a two-step process. First, census experts introduce “noise” into the raw census data. This randomizes it in a way that makes identification of individuals impossible while still ensuring that the data in aggregate accurately represent key parameters such as total population in a state. But the addition of noise can introduce quirks into the data that sometimes result in obviously impossible results. For example, certain districts may appear to have negative populations. To avoid such anomalies, a second step is applied to clean up the data.

What the researchers found is that this second post-processing step systematically undercounts voters in highly diverse areas as compared to those places with more homogeneous populations. If diverse areas seem to have a lower population than they actually do, mapmakers must artificially enlarge those areas in order to comply with strict rules that require districts in a state to have as close to an equal population as possible. Yet adding more voters to a district means that the voting power of existing voters in that district is necessarily diluted. Given that this bias is especially prevalent in more diverse areas, it disproportionately affects the voting power of minority voters.

“Exactly how this undercounting of certain populations happens is hard to understand because the post-processing method that’s being applied is very complicated and it’s hard to unpack what is going on,” says professor of government and statistics Kosuke Imai. Still, the results of the team’s analysis showed differential privacy was clearly affecting minority voters, and the effects of this bias were particularly pronounced in smaller districts.

This is a new blow to minorities, who have already seen their voting power significantly diminished during the past decade. In the 2013 Supreme Court case Shelby County v. Holder, the justices overturned a section of the 1965 Voting Rights Act that required states with a history of using the redistricting process to dilute the power of minority voters to submit new redistricting plans for preclearance by the Department of Justice. By removing this requirement, the court gave mapmakers more leeway to configure districts in ways that reduced the power of minority voters, even as the number of those voters has grown.

Consider Texas, a state that was required to submit its redistricting plans for preclearance prior to 2013. A 2019 estimate from the US Census Bureau found that people of color accounted for around 80 percent of the growth in Texas’ voting-age citizen population since 2010, and that voters who identified solely as white now make up 46 percent of the state’s voting age population. Nevertheless, its new maps have reduced the number of districts where minority voters can determine the outcome of an election. The discrepancy didn’t escape the notice of the Department of Justice, which sued the state in December for violating the Voting Rights Act with its new redistricting plans.

When redistricting is led by a state’s elected representatives rather than a neutral citizen commission, it typically becomes an exercise in gerrymandering, the term for securing a voting advantage for a particular group. Partisan fighting arises over the question of who gets the advantage, and how that advantage is decided. Although party tactics may differ, both Democrats and Republicans are guilty of gerrymandering techniques that make it difficult for blocs of minority voters to elect politicians that represent their interests. For example, Republicans tend to engage in voter “packing,” or concentrating minority voters into as few districts as possible, whereas Democrats are more likely to “crack” minority voters by spreading them over as many districts as possible to tip the balance of political power in their favor.

Race may not be scientific, but it does have a social and political reality defined by shared ethnicity, heritage, or cultural values that often translate into identifiable voting patterns. The Voting Rights Act was designed to ensure minority communities had the opportunity to elect representatives that shared their interests by protecting the integrity of “majority-minority” districts where most of the people in the district belonged to a racial minority. The act had a massive effect on the character of American politics, increasing the number of minority representatives by a factor of 10 between 1965 and the new millennium. But now this result is being challenged by a combination of partisan legal battles and statistical biases of the sort discovered by the Harvard researchers.

Still unknown is how much the racial bias resulting from the application of differential privacy has affected the results of the 2020 census. The Harvard team’s analysis is based on 2010 census data provided by the bureau that allowed researchers to compare the effects of differential privacy against the previous technique for protecting voter identification. Thus the researchers can’t be certain that the bias they found in the 2010 census data recurred to the same degree in the 2020 data. But the fact that the United States has become more diverse since the prior census leads them to suspect that the bias they detected will be even more pronounced.

The team submitted its findings to the Census Bureau last spring; a few weeks later, the bureau released a statement describing changes to the post-processing technique that would mitigate the undercounting of minority voters. Yet as the team detailed in a subsequent research paper published last autumn, those changes may have reduced the magnitude of the systematic undercounting but not eliminated it. Given that the data are already being used to draw electoral district boundaries across the United States, it’s likely too late to do anything about any resulting unintentional bias against minority voters. Even a slight undercounting of minority voters could have a profound effect on the outcome of the 2022 midterms, especially in highly partisan states like Virginia where the thinnest of margins recently decided the gubernatorial election.

“There’s not really much we can do to fix it, we just need to know that it happened,” says Kenny. “Differential privacy is a very powerful tool, but I’m not sure this was the right tool for the census bureau.”

You Might Also Like:

Icebergs at the end of the Ilulissat Icefjord, Disko Bay, western Greenland.

Icebergs at the end of the Ilulissat Icefjord, Disko Bay, western Greenland

Photograph by iStock

Greenland’s Fingerprint in Rising Seas

Map of the expansion of Indo-European languages from a source in the highlands of West Asia.

More than 5,000 years ago, Caucasus hunter-gatherers from the highlands between the Black and Caspian Seas traveled west to Anatolia and north to the steppe, splitting their Proto-Indo-European language into two branches. From the steppe, their Yamnaya horse-herder descendants spread their language and genes into daughter languages and cultures across Eurasia. Border colors indicate the geographic origins of five source populations before their migrations (shown by correspondingly colored arrows), while the pie charts show the post-migration admixtures in these regions.

Figure reprinted with permission from I. Lazaridis et al., Science 377:939(2022). 

Seeking the First Speakers of Indo-European Language

Illustration of a city downwind from a fracking well

Illustration by Matt Chinworth

Fracking’s Deadly Toll

You Might Also Like:

Icebergs at the end of the Ilulissat Icefjord, Disko Bay, western Greenland.

Icebergs at the end of the Ilulissat Icefjord, Disko Bay, western Greenland

Photograph by iStock

Greenland’s Fingerprint in Rising Seas

Map of the expansion of Indo-European languages from a source in the highlands of West Asia.

More than 5,000 years ago, Caucasus hunter-gatherers from the highlands between the Black and Caspian Seas traveled west to Anatolia and north to the steppe, splitting their Proto-Indo-European language into two branches. From the steppe, their Yamnaya horse-herder descendants spread their language and genes into daughter languages and cultures across Eurasia. Border colors indicate the geographic origins of five source populations before their migrations (shown by correspondingly colored arrows), while the pie charts show the post-migration admixtures in these regions.

Figure reprinted with permission from I. Lazaridis et al., Science 377:939(2022). 

Seeking the First Speakers of Indo-European Language

Illustration of a city downwind from a fracking well

Illustration by Matt Chinworth

Fracking’s Deadly Toll