race

On Using “Racial” Color Categories in Data Visualization

ethnicity americas.png

 

Today I came across this map depicting the ethnic composition in the Americas (h/t Randy Olson).

It rubbed me the wrong way immediately, and I’m not talking about its use of multiple pie charts. It brought to mind these examples from Stephanie Evergreen (via Vidhya Shanker).

The first issue is that the categories themselves seem arbitrarily defined, and there is a conflation of ethnicity and race. For instance, “Mestizo,” “Mulatto,” “Garifuna,” and “Zambo” are all multiracial or multiethnic groups, yet, there is also an “Other, Multiracial, Mixed” category.

ethnic categories.png

Complicating matters further, as a result of legal classifications regulating those of African descent, in some places (e.g. the United States), a number of those categorized as “Black” have been of multiracial descent.  Moreover, the title refers specifically to “Ethnic Composition,” but “Black” and “White” are technically not ethnicities.

This gets me to the terms themselves. While the Spanish term “mulato” may still be acceptable, the English “mulatto” is definitely no longer considered appropriate (and is often considered a racial slur), yet there it is representing people in The U.S. and Canada.

Then, there is the most glaring problem, in my view: the use of symbolic racial category colors to represent the different groups.  I’m sure the thought behind it was to use immediately recognizable colors to limit confusion, but in data visualization, a good practice is asking if the benefits of familiarity outweigh the costs.

What are some of those costs? Specifically, using one stylized, but supposedly realistic “racial” color to represent each group brings us back to the earlier point about conflating race and ethnicity.  It also takes groups that contain people with a wide range of skin tones and represents each one using only one shade. This is not a problem when the colors are abstract (see this racial dot map featured below–no African Americans actually have green skin, for instance). But when it’s supposed to represent real people, it feels both reductive and exclusive.

racial dot map.png

And what are we supposed to make of the fact that most of the racial colors are “realistic,” except for the bright Red of “Native American” and the yellow of “East Asian, East Indian, Javanese” category?

There has also been some pushback against this kind of “familiar” color categorization with respect to sex and gender.

Bottom line: data visualization is all about deliberate choices and tradeoffs. When confronted with “sensitive” data, it’s a good idea to ask yourself, “Could the  choices I’ve made offend people?”

Let’s say you have an aversion to this kind of framing: to “offense” as a legitimate constraint.  That’s fine. In that case, I’d suggest you modify the question to “Could the presentation & classification choices I’ve made distract from the content?”

Thoughts?