Data Viz Revision: NIMSP’s Campaign Contributions Disclosure Scorecard

 

Earlier this year, the National Institute on Money in State Politics (NIMSP) created a brilliant campaign contributions disclosure scorecard depicting each US state’s grade.

They used Tableau to visualize the data as a geographic state choropleth and labeled each state with its scorecard grade (A through F):

Sunlight Scorecard

 

There are a few problems with visualizing this data in this way.  First, there’s no way to see Washington, DC on this map. Also, on this map, the grade labels are redundant because the legend already shows what color represents each grade. But the biggest problem is that using a geographic map creates scale issues: you have to scroll and/or zoom to see the small states in the Northeast (e.g., Delaware, Rhode Island), not to mention Alaska and Hawaii.

The scorecard map is showing which states got which overall grade, so it doesn’t matter how big each state is.  For the purposes of this data, the states have equal weight. To try to solve this scale issue, I decided to revise the map as an abstract state map, with every state depicted using a figure of the same size and shape (I was inspired by this post by Danny DeBelius from NPR’s Visuals Team).

I decided to make a square tile map in Excel, applying conditional formatting on the state squares to get the different colors.  I downloaded the data from NIMSP’s Tableau file, then used this awesome tutorial by Caitlyn Dempsey at GIS Lounge. Here’s what my Excel square tile map looks like:

NIMSP Excel Scorecard

 

I think the legend is a bit too big (It’s not that easy to make the cells smaller. You’d probably have to make all of the cells tiny, and merge them to get the bigger state squares).

Overall, the Excel tile map is fine, but I think it lacks a bit of visual pop.

Next, I decided to try make the same tile map in Tableau, only with hexagons instead of squares.   I followed Matt Chambers’ user-friendly tutorial at Sir-Viz-a-Lot and made the following map:

NIMSP Scorecard Hex Tableau.jpg

Just for kicks, I switched the tiles back to squares instead of hexagons, and got what looks like a keyboard.  With this tile configuration, I think the hexagons look much better.

NIMSP Scorecard Square Tableau.jpg

Finally, I incorporated a little from Keir Clark of Maps Mania, Andrew W Hill from the CartoDB Team, and the Danny DeBelius NPR post I mentioned earlier.  Here’s the result:

NIMSP Scorecard Hex CartoDB2.png

So, what do you think?  Which revision do you think is most effective?

 

Data Chefs Data Viz Organizer: Problems & Questions

 

This post is part of an ongoing series about the Data Chefs Viz Organizer, a planning document designed to help people create visualizations from conception to end product.

 

As I said in the last post, I created the Data Chefs Visualization Organizer to provide a little more guidance than the Junk Charts Trifecta Checkup (JCTC), especially for crafting the question driving the data visualization.  This organizer proved to be really useful for the first cohort, who built their visualizations from scratch.

Here are a couple (modified) examples of the question section of the organizer when completed:

question3

Again, the benefit of expanding on the JCTC is that students now have some context they can use when creating their questions; bracketing each question with a formulated problem and possible decisions to be made as a result of answering the question made for much more effective questions.

 

Data Chefs Data Viz Organizer: Building on the Junk Charts Trifecta

This post is the first in an ongoing series about the Data Chefs Viz Organizer, a planning document designed to help people create visualizations from conception to end product.

 

I work at a nonprofit, and I’ve recently started working to help the organization get better at data viz. One thing I’m doing is training small workgroups of Data Viz Ambassadors (DVAs—pronounced “divas”), to help spread best data viz practices from the ground up. In addition to learning about best practices, DVAs must produce a practical visualization using their own data.

My syllabus for the workgroup is strongly influenced by Kaiser Fung’s Junk Charts Trifecta Checkup (JCTC):

junkcharts trifecta

I’ve had the workgroup use the JCTC as an organizing tool because it puts the question—which is very important, but often overlooked—on equal footing with the data and the visualization.  The JCTC is ideal for analyzing visualizations, but most of the DVAs have struggled to figure out how to use the JCTC when producing their own visualizations.

When I realized that this was a common problem, I created the data viz template below to help with the process of planning and documentation (note how the Data Chefs organizer parallels the JCTC):

DataChefsVizOrganizer

The goal is to keep the question front and center, like the JCTC does, while providing a bit more structure to the process of creating and documenting data visualizations.  Hopefully, someone else in the organization can look at a completed organizer and understand (and even replicate) the process and the final data visualization.

I will be returning to this organizer in later posts, going into more detail on each component, and soliciting feedback on how to improve the document.

Initial thoughts on the organizer as a planning guide?

Where We Are Headed: Data Visualization, Data Manipulation

When we first started Data Chefs, we thought we were going to focus on making it a lot easier to clean up and slice & dice data. Many folks in “data science” will tell you that they spend the vast majority of their time prepping their data before they can analyze it, so if we could make this easier for folks in the community, it would be a real win.

After banging our heads against various doors, we realized we were trapped the chicken or the egg problem: without concrete to show folks, the idea that they could have any say on how easy/hard the tools for wrangling data were seemed overwhelming, and without community folks to figure out what “easier” looked like, we wouldn’t be able to convince data geeks to build easier-to-use tools.

So over the past six months, we’ve gradually been shifting our focus. You’re going to start seeing a lot more about data visualization on this blog. That’s because even if people are scared of or overwhelmed by the idea that they could shape the tools they use, everybody likes shiny objects. So going forward, we’re going to explore what it means to make an organization data visualization-literate, and we’re going to do some D3 experiments to see if we can smooth its learning curve.

That said, we aren’t entirely giving up on data manipulation. Although we weren’t able to put together a community of folks, we did learn a lot about the problems that a Data Chefs approach would have to solve; I’ll blog about it in the next few months.

Data Viz Revision: KPMG’s Global Auto Exec Survey

KPMG conducts an annual survey of automotive executives from around the world, and this year, it visualized the results using Tableau.  The report design is definitely slick, but I had problems with the way the author(s) chose to display some of the data.

 I found the graph for the results of the business model disruption question especially puzzling:

KPMG Survey.JPG

First, this is clearly Likert-type data, so why is the “Neutral” category placed at the end, instead of in the middle, between the “likely” categories and the “unlikely” categories?

Also, while the color choices of lighter and darker blue for 2015 & 2016 are fine, setting these colors against a white background creates the problem of making the labels of the small values hard to see (see the 3% values for “Extremely likely” & “Not very likely”).

Finally, and most important, a series of back-to-back bar charts (used often for population pyramids) is an odd choice to display data from different years.  I knew I was looking at survey data with a Likert-type scale, so when I saw the back to back x-axis, I expected to see diverging stacked bar charts. When I didn’t, I was confused, and it took me a while to figure it out.

I decided to try my hand at revising this visualization to one more appropriate for the data.  First, I tried to see what the diverging stacked 100% bar charts would look like (h/t Stephanie Evergreen):

KPMG Survey

There are a couple of problems with this alternative:

1. The fact that 2016 has a 0 value for the “Neutral” category throws off the balance, and almost the entire reason for choosing this graph, which is to see the like categories grouped together for better comparison.

2. Also, ideally, the category names would be written above the bars themselves, making the key unnecessary; however, because of the imbalance of the bars, and because, as mentioned above, a couple of the extreme categories have very small values, writing the category names right above the bars was not an option.

Next, I tried a standard stacked 100% bar chart:

KPMG Survey2

This is slightly better than the first revision because, at least the like groups can be compared from the same starting point (the ends, in this case).  It still doesn’t seem ideal for the data.

So, I decided to try a slope chart:

DataChefs_KMPG_disruption_revision3.png

The appeal of this chart is that it emphasizes the most important feature of this data:the change in each category from 2015 to 2016.  The proportional totals are lost with this graph, but I don’t know how important they are.

What do you think? Which of these charts do you think best displays the KPMG survey data?

“Fluff” Vs Context: In the Right Environment, Girls Out-Program Boys

At the beginning of January, Motherboard’s Michael Byrne argued that if you had a New Year’s resolution to start learning how to code, you should learn to do it ” the hard way.”

I learned to program in C at a community college and I wouldn’t have done it any other way….. I was hooked on problem solving and algorithms. This is a thing unique to programming: the immediate, nearly continuous rush of problem solving. To me, programming feels like solving puzzles (or, rather, it is solving puzzles), or a great boss fight in a video game….

The point is to learn programming as it is nakedly, minus as much gunk and fluff that can possibly be removed from the experience.

Let’s put aside for a moment the “hard way” vs “gunk and fluff” pseudo-macho, my-head-was-shoved-in-a-locker-in-high-school-and-I-haven’t-got-over-it framing used by guys like Byrne. There are an awful lot of people who’d agree this is the best way to teach anyone who wants to learn. But that’s because it appeals to them, not because empirical evidence backs them up.

Take a recent study by Kate Howland and Judith Good.

Researchers in the University’s Informatics department asked pupils at a secondary school to design and program their own computer game using a new visual programming language that shows pupils the computer programs they have written in plain English.

Dr Kate Howland and Dr Judith Good found that the girls in the classroom wrote more complex programs in their games than the boys and also learnt more about coding compared to the boys.

Why did girls do so much better? Here’s what Good thinks is happening:

Given that girls’ attainment in literacy is higher than boys across all stages of the primary and secondary school curriculum, it may be that explicitly tying programming to an activity that they tend to do well in leads to a commensurate gain in their programming skills.

In other words, if girls’ stories are typically more complex and well developed, then when creating stories in games, their stories will also require more sophisticated programs in order for their games to work.

And it isn’t just that these girls are more skilled at telling complex stories; they also enjoy doing it.

It’s an important lesson, not only for teaching children but also adults. If you want to make Data Science more accessible, the first thing we need to ask is where are the audiences we are trying to reach coming from? If we understand what get someone fired up and what skills they bring to the table, it can go a long way in unlocking their ability — and just as importantly, their desire — to excel in this new field.


UPDATE: I’d also like to point out that there are a decent number of guys like me who, unlike Byrne, think solving abstract puzzles is boring as hell. In my three decades of coding and managing complex software projects, this lack of enthusiasm for abstract puzzles hasn’t been a problem so far.

Can I Cook?: Impostor Syndrome and “Data Science”

Lately, I’ve been thinking a lot about the difference between amateurs and experts. In particular, I’ve been wrestling with our use of the term “data science” on this site.  To me, the term denotes a level of expertise that I don’t feel comfortable claiming yet. Through a series of conversations with my Data Chefs colleagues, who have experienced similar discomfort at times, I’ve learned a great deal.

My colleagues and I have discussed the need to actively combat impostor syndrome, an accomplished person’s fear that s/he is a fraud who will eventually be exposed.  Left unchecked, impostor syndrome can stifle creativity and momentum, especially among women and people of color. Still, I’ve learned that I feel much better if I don’t seek to claim an identity as a “data scientist,” but instead think of myself as someone who’s “doing data science” (albeit at a beginner’s pace).

As mentioned in the last post, most of us know some incredible home cooks who didn’t go to fancy culinary schools or study under distinguished Michelin-rated chefs.  These amateur chefs have made dishes hundreds of times, perfecting them, first through trial and error, and eventually through skill and intuition. For that reason, I’d put their knowledge up against that of most formally recognized experts.

If you’re like me, an amateur struggling to take stock of your data science abilities and accomplishments, the relevant question to ask yourself isn’t the equivalent of “Do I consider myself a chef?”  It’s “Am I learning how to cook?” If the answer to that is “Yes,” then, eventually, you’ll be able to ask yourself the only question that matters: “Can I cook?”

You Don’t Have To Be a Data Chemist to Bake Data Cookies

One of the reactions I’ve gotten to the argument behind my last post is that it’s unrealistic to think we can smooth data science’s learning curve. When you get beyond very simple point and click, you’ve got to immerse yourself in the dirty details of how statistics, machine learning, etc. work. In other words, we can’t really make data science accessible because the body of knowledge you need to go beyond baby steps is just too large.

When I first ran into this argument, I would reply with stories about the skilled practitioners in the field I’ve worked with who’ve forgotten a lot of what they learned in, say, intro stats – couldn’t perform a chi-square test by hand if their life depended on it – but still produce very powerful, highly influential work. These days my answer is a lot simpler.

Let’s have a show of hands of everyone who has relatives or friends who are amazing cooks. Now keep your hand up if most of those amazing cooks know the chemistry and physics behind what they do. Not a whole lot of hands left up.

It’s not that these amazing cooks don’t have any of the knowledge that’s embodied in chemistry and physics. They know a lot about how to work with boiling water, how you know when something they’ve been frying is done, etc. But the model they have in their head – or “in their fingers” – isn’t the one you get in chemistry class.

I think Data Chefs is going to end up demonstrating that’s also true for data science: you don’t need to be a Data Chemist to bake great data cookies. I don’t have any concrete empirical data to back me up. But neither do the people who are saying it can’t be done. All we know for sure is that that’s not how it’s been taught in the past. And if the data-driven revolution has taught us anything, you wouldn’t want to build the foundation of data science training on “but that’s the way it’s always been done.”

Beyond Data “Hot Pockets”: Creating a Continuum of Tools

How do you make Data Science more accessible? A lot of folks say the answer is to create easy to use drag-and-drop tools that hide all the complex, icky stuff. At Data Chefs, we totally get why smart, dedicated people think Data Science should head in this direction. But we don’t think it’s sustainable.

If you know exactly, precisely the kind of data analysis people are going to want to do, then easy to use drag-and-drop tools are great. But in our experience, it often doesn’t play out that way.

Say you’re part of a coalition that’s tried to reduce kids’ asthma that’s caused by polluted air. Someone finds this great little tool that lets you easily import in data and map it to your heart’s content. Problem solved!

And then someone points out that for two thirds of the data visualizations you want to do, the maps are perfect, but for the other third you need a different kind of map – and the easy-to-use tool doesn’t make those kind of maps.

Even more likely, you run into a snag with the data. The maps would be a lot more useful if you could merge in some census data, only the tool can’t do that. Or it can only merge in really small data sets, or census data that’s in another format, or…

Okay, you say, we can come up with a workaround. Maybe there’s another tool you can use to merge or reformat data before you import the data into your great little mapping tool. The catch: this new tool isn’t as easy to use. Someone needs to learn how to run it from the command line, decipher the obscure documentation and make sense out of the cryptic, passive-aggressive error messages the tool spits out whenever you make a tiny mistake. So maybe instead, if you run some of your data through Excel, and then use this other little utility, and then clean some of the data by hand, and some other ugly little duct tape hack.…

In short, using the “easy” tool has morphed into complicated, painful process that nobody will remember how to do three months from now when you need to create another set of maps.

Or worse yet, there is no obvious workaround – now you need a programmer.

We don’t think it makes sense to keep building a data science landscape where there are easy-to-use tools but when you need to go a little beyond what they can do you fall off a cliff. Instead, we think it’s time to take a page from the world of cooking.

The world of cooking isn’t divided into people who can only microwave a hot pocket and Master Chefs. Instead there’s a continuum of experience.

A lot of people start off their cooking journey by just being able to microwave a TV dinner or add milk to cereal. Then maybe they learn how to make mac & cheese from a box, scramble eggs in a pan, or throw together a simple salad. Then maybe they learn how to grill burgers or bake chocolate chip cookies from the recipe on the chocolate chip package. Then they build up a repertoire of a handful of their go-to recipes. Or maybe early on one of their family members or relatives teaches them how to make some simple versions of the food that are part of their heritage. Or maybe they decide they’re going to take a class.

There are lots of paths you can take to learn to cook. And not everyone gets to or needs to get to the same level of skill; plenty of people can take care of their culinary needs without becoming an expert cook.

But what most of these paths have in common is that there’s a continuum. If you want to do more than microwave hot pockets, you don’t have to enroll in the Culinary Institute of America. You can take baby steps towards getting more skill based on what kind of food interests you the most and how much time & effort you want to put into it.

We think the world of data science would be a lot more accessible – and a lot more diverse – if was built around a continuum of tools that, like cooking, let folks take baby steps towards getting more skill as they had the need and time for it.

What exactly would this continuum look like? We’re not sure. That’s what Data Chefs aims to find out.