New Data Chefs Data Viz Revision Organizer

This post is part of an ongoing series about the Data Chefs Viz Organizer, a planning document designed to help people create visualizations from conception to end product.

 

Because the 1st Cohort of Data Viz Ambassadors (DVAs) had some success using the original Visualization Organizer, I was a little surprised when I realized that the Organizer wasn’t as helpful to the 2nd Cohort.

After some investigation, I figured out the issue: while the 1st Cohort mostly wanted to create visualizations from scratch; those in the 2nd Cohort were more interested in revising existing visualizations that weren’t up to par.  For them, the Question/Problem section of the Visualization Organizer just wasn’t necessary.

Compare the kinds of problems and questions from the 1st Cohort…

question3

…to the common problem of those in Cohort 2 who revised visualizations:

revision question.png

 

This is not the kind of in-depth, substantive problem/question that typically drives data inquiry.  But it made me think about the important functional task of revising data visualizations and sugested the need to tweak the template for those who are just revising them, not designing them from scratch.

I came up with this (click on the image for a PDF version):

Data Chefs Viz Revision Organizer blank

Compare the original from scratch Organizer (left) to this Revision Organizer (right):

data chefs org side by side

Again, because the Problem/Question section was not necessary for revising, I scrapped it altogether, along with the Assumptions section. I also beefed up the Chart/Visualization section, adding space to describe the old visualization and its drawbacks as well as the proposed revision and the rationale for the change.

Our hope is that people use this organizer to document their revision process.

Using one of the revisions we did here a while back, we’ve completed a Revision Organizer below (click image for PDF version):

Data Chefs Viz Revision Organizer NIMSP

What do you think? We’d love to hear your thoughts.

On Using “Racial” Color Categories in Data Visualization

ethnicity americas.png

 

Today I came across this map depicting the ethnic composition in the Americas (h/t Randy Olson).

It rubbed me the wrong way immediately, and I’m not talking about its use of multiple pie charts. It brought to mind these examples from Stephanie Evergreen (via Vidhya Shanker).

The first issue is that the categories themselves seem arbitrarily defined, and there is a conflation of ethnicity and race. For instance, “Mestizo,” “Mulatto,” “Garifuna,” and “Zambo” are all multiracial or multiethnic groups, yet, there is also an “Other, Multiracial, Mixed” category.

ethnic categories.png

Complicating matters further, as a result of legal classifications regulating those of African descent, in some places (e.g. the United States), a number of those categorized as “Black” have been of multiracial descent.  Moreover, the title refers specifically to “Ethnic Composition,” but “Black” and “White” are technically not ethnicities.

This gets me to the terms themselves. While the Spanish term “mulato” may still be acceptable, the English “mulatto” is definitely no longer considered appropriate (and is often considered a racial slur), yet there it is representing people in The U.S. and Canada.

Then, there is the most glaring problem, in my view: the use of symbolic racial category colors to represent the different groups.  I’m sure the thought behind it was to use immediately recognizable colors to limit confusion, but in data visualization, a good practice is asking if the benefits of familiarity outweigh the costs.

What are some of those costs? Specifically, using one stylized, but supposedly realistic “racial” color to represent each group brings us back to the earlier point about conflating race and ethnicity.  It also takes groups that contain people with a wide range of skin tones and represents each one using only one shade. This is not a problem when the colors are abstract (see this racial dot map featured below–no African Americans actually have green skin, for instance). But when it’s supposed to represent real people, it feels both reductive and exclusive.

racial dot map.png

And what are we supposed to make of the fact that most of the racial colors are “realistic,” except for the bright Red of “Native American” and the yellow of “East Asian, East Indian, Javanese” category?

There has also been some pushback against this kind of “familiar” color categorization with respect to sex and gender.

Bottom line: data visualization is all about deliberate choices and tradeoffs. When confronted with “sensitive” data, it’s a good idea to ask yourself, “Could the  choices I’ve made offend people?”

Let’s say you have an aversion to this kind of framing: to “offense” as a legitimate constraint.  That’s fine. In that case, I’d suggest you modify the question to “Could the presentation & classification choices I’ve made distract from the content?”

Thoughts?

Data Chefs Data Viz Organizer: Complete Example

This post is part of an ongoing series about the Data Chefs Viz Organizer, a planning document designed to help people create visualizations from conception to end product.

 

In an earlier post, I explored the problem and question section of the Data Chefs Viz Organizer and offered some examples.  In this post, I’d like to provide an example of a fully completed organizer template. I think this will demonstrate how the organizer can help guide the work would-be visualizers.

Click on the image below for the PDF version with working links.

 

committefromscratch

As I detailed in prior posts, the 3 major sections of the organizer-the ones shaded purple-are based on the the Junk Charts Trifecta Checkup (JCTC).  I don’t think it makes sense to limit “Assumptions” to any one of the 3 sections (Question, Data, and Chart), so I placed it on its own (number IV.).

Below are the final versions of both hexmap graphs.

This one for the raw number of Committe Members in each state:

 

…and this one for the difference between Actual and Expected Committee Members in each state:

 

Thoughts about the organizer and the example?  How could it be improved? Could you use this organizer help guide and document your process for creating a visualization from scratch?

Data Viz Revision: NIMSP’s Campaign Contributions Disclosure Scorecard

 

Earlier this year, the National Institute on Money in State Politics (NIMSP) created a brilliant campaign contributions disclosure scorecard depicting each US state’s grade.

They used Tableau to visualize the data as a geographic state choropleth and labeled each state with its scorecard grade (A through F):

Sunlight Scorecard

 

There are a few problems with visualizing this data in this way.  First, there’s no way to see Washington, DC on this map. Also, on this map, the grade labels are redundant because the legend already shows what color represents each grade. But the biggest problem is that using a geographic map creates scale issues: you have to scroll and/or zoom to see the small states in the Northeast (e.g., Delaware, Rhode Island), not to mention Alaska and Hawaii.

The scorecard map is showing which states got which overall grade, so it doesn’t matter how big each state is.  For the purposes of this data, the states have equal weight. To try to solve this scale issue, I decided to revise the map as an abstract state map, with every state depicted using a figure of the same size and shape (I was inspired by this post by Danny DeBelius from NPR’s Visuals Team).

I decided to make a square tile map in Excel, applying conditional formatting on the state squares to get the different colors.  I downloaded the data from NIMSP’s Tableau file, then used this awesome tutorial by Caitlyn Dempsey at GIS Lounge. Here’s what my Excel square tile map looks like:

NIMSP Excel Scorecard

 

I think the legend is a bit too big (It’s not that easy to make the cells smaller. You’d probably have to make all of the cells tiny, and merge them to get the bigger state squares).

Overall, the Excel tile map is fine, but I think it lacks a bit of visual pop.

Next, I decided to try make the same tile map in Tableau, only with hexagons instead of squares.   I followed Matt Chambers’ user-friendly tutorial at Sir-Viz-a-Lot and made the following map:

NIMSP Scorecard Hex Tableau.jpg

Just for kicks, I switched the tiles back to squares instead of hexagons, and got what looks like a keyboard.  With this tile configuration, I think the hexagons look much better.

NIMSP Scorecard Square Tableau.jpg

Finally, I incorporated a little from Keir Clark of Maps Mania, Andrew W Hill from the CartoDB Team, and the Danny DeBelius NPR post I mentioned earlier.  Here’s the result:

NIMSP Scorecard Hex CartoDB2.png

So, what do you think?  Which revision do you think is most effective?

 

Data Chefs Data Viz Organizer: Problems & Questions

 

This post is part of an ongoing series about the Data Chefs Viz Organizer, a planning document designed to help people create visualizations from conception to end product.

 

As I said in the last post, I created the Data Chefs Visualization Organizer to provide a little more guidance than the Junk Charts Trifecta Checkup (JCTC), especially for crafting the question driving the data visualization.  This organizer proved to be really useful for the first cohort, who built their visualizations from scratch.

Here are a couple (modified) examples of the question section of the organizer when completed:

question3

Again, the benefit of expanding on the JCTC is that students now have some context they can use when creating their questions; bracketing each question with a formulated problem and possible decisions to be made as a result of answering the question made for much more effective questions.

 

Data Chefs Data Viz Organizer: Building on the Junk Charts Trifecta

This post is the first in an ongoing series about the Data Chefs Viz Organizer, a planning document designed to help people create visualizations from conception to end product.

 

I work at a nonprofit, and I’ve recently started working to help the organization get better at data viz. One thing I’m doing is training small workgroups of Data Viz Ambassadors (DVAs—pronounced “divas”), to help spread best data viz practices from the ground up. In addition to learning about best practices, DVAs must produce a practical visualization using their own data.

My syllabus for the workgroup is strongly influenced by Kaiser Fung’s Junk Charts Trifecta Checkup (JCTC):

junkcharts trifecta

I’ve had the workgroup use the JCTC as an organizing tool because it puts the question—which is very important, but often overlooked—on equal footing with the data and the visualization.  The JCTC is ideal for analyzing visualizations, but most of the DVAs have struggled to figure out how to use the JCTC when producing their own visualizations.

When I realized that this was a common problem, I created the data viz template below to help with the process of planning and documentation (note how the Data Chefs organizer parallels the JCTC):

DataChefsVizOrganizer

The goal is to keep the question front and center, like the JCTC does, while providing a bit more structure to the process of creating and documenting data visualizations.  Hopefully, someone else in the organization can look at a completed organizer and understand (and even replicate) the process and the final data visualization.

I will be returning to this organizer in later posts, going into more detail on each component, and soliciting feedback on how to improve the document.

Initial thoughts on the organizer as a planning guide?

Where We Are Headed: Data Visualization, Data Manipulation

When we first started Data Chefs, we thought we were going to focus on making it a lot easier to clean up and slice & dice data. Many folks in “data science” will tell you that they spend the vast majority of their time prepping their data before they can analyze it, so if we could make this easier for folks in the community, it would be a real win.

After banging our heads against various doors, we realized we were trapped the chicken or the egg problem: without concrete to show folks, the idea that they could have any say on how easy/hard the tools for wrangling data were seemed overwhelming, and without community folks to figure out what “easier” looked like, we wouldn’t be able to convince data geeks to build easier-to-use tools.

So over the past six months, we’ve gradually been shifting our focus. You’re going to start seeing a lot more about data visualization on this blog. That’s because even if people are scared of or overwhelmed by the idea that they could shape the tools they use, everybody likes shiny objects. So going forward, we’re going to explore what it means to make an organization data visualization-literate, and we’re going to do some D3 experiments to see if we can smooth its learning curve.

That said, we aren’t entirely giving up on data manipulation. Although we weren’t able to put together a community of folks, we did learn a lot about the problems that a Data Chefs approach would have to solve; I’ll blog about it in the next few months.

Data Viz Revision: KPMG’s Global Auto Exec Survey

KPMG conducts an annual survey of automotive executives from around the world, and this year, it visualized the results using Tableau.  The report design is definitely slick, but I had problems with the way the author(s) chose to display some of the data.

 I found the graph for the results of the business model disruption question especially puzzling:

KPMG Survey.JPG

First, this is clearly Likert-type data, so why is the “Neutral” category placed at the end, instead of in the middle, between the “likely” categories and the “unlikely” categories?

Also, while the color choices of lighter and darker blue for 2015 & 2016 are fine, setting these colors against a white background creates the problem of making the labels of the small values hard to see (see the 3% values for “Extremely likely” & “Not very likely”).

Finally, and most important, a series of back-to-back bar charts (used often for population pyramids) is an odd choice to display data from different years.  I knew I was looking at survey data with a Likert-type scale, so when I saw the back to back x-axis, I expected to see diverging stacked bar charts. When I didn’t, I was confused, and it took me a while to figure it out.

I decided to try my hand at revising this visualization to one more appropriate for the data.  First, I tried to see what the diverging stacked 100% bar charts would look like (h/t Stephanie Evergreen):

KPMG Survey

There are a couple of problems with this alternative:

1. The fact that 2016 has a 0 value for the “Neutral” category throws off the balance, and almost the entire reason for choosing this graph, which is to see the like categories grouped together for better comparison.

2. Also, ideally, the category names would be written above the bars themselves, making the key unnecessary; however, because of the imbalance of the bars, and because, as mentioned above, a couple of the extreme categories have very small values, writing the category names right above the bars was not an option.

Next, I tried a standard stacked 100% bar chart:

KPMG Survey2

This is slightly better than the first revision because, at least the like groups can be compared from the same starting point (the ends, in this case).  It still doesn’t seem ideal for the data.

So, I decided to try a slope chart:

DataChefs_KMPG_disruption_revision3.png

The appeal of this chart is that it emphasizes the most important feature of this data:the change in each category from 2015 to 2016.  The proportional totals are lost with this graph, but I don’t know how important they are.

What do you think? Which of these charts do you think best displays the KPMG survey data?

“Fluff” Vs Context: In the Right Environment, Girls Out-Program Boys

At the beginning of January, Motherboard’s Michael Byrne argued that if you had a New Year’s resolution to start learning how to code, you should learn to do it ” the hard way.”

I learned to program in C at a community college and I wouldn’t have done it any other way….. I was hooked on problem solving and algorithms. This is a thing unique to programming: the immediate, nearly continuous rush of problem solving. To me, programming feels like solving puzzles (or, rather, it is solving puzzles), or a great boss fight in a video game….

The point is to learn programming as it is nakedly, minus as much gunk and fluff that can possibly be removed from the experience.

Let’s put aside for a moment the “hard way” vs “gunk and fluff” pseudo-macho, my-head-was-shoved-in-a-locker-in-high-school-and-I-haven’t-got-over-it framing used by guys like Byrne. There are an awful lot of people who’d agree this is the best way to teach anyone who wants to learn. But that’s because it appeals to them, not because empirical evidence backs them up.

Take a recent study by Kate Howland and Judith Good.

Researchers in the University’s Informatics department asked pupils at a secondary school to design and program their own computer game using a new visual programming language that shows pupils the computer programs they have written in plain English.

Dr Kate Howland and Dr Judith Good found that the girls in the classroom wrote more complex programs in their games than the boys and also learnt more about coding compared to the boys.

Why did girls do so much better? Here’s what Good thinks is happening:

Given that girls’ attainment in literacy is higher than boys across all stages of the primary and secondary school curriculum, it may be that explicitly tying programming to an activity that they tend to do well in leads to a commensurate gain in their programming skills.

In other words, if girls’ stories are typically more complex and well developed, then when creating stories in games, their stories will also require more sophisticated programs in order for their games to work.

And it isn’t just that these girls are more skilled at telling complex stories; they also enjoy doing it.

It’s an important lesson, not only for teaching children but also adults. If you want to make Data Science more accessible, the first thing we need to ask is where are the audiences we are trying to reach coming from? If we understand what get someone fired up and what skills they bring to the table, it can go a long way in unlocking their ability — and just as importantly, their desire — to excel in this new field.


UPDATE: I’d also like to point out that there are a decent number of guys like me who, unlike Byrne, think solving abstract puzzles is boring as hell. In my three decades of coding and managing complex software projects, this lack of enthusiasm for abstract puzzles hasn’t been a problem so far.