O’Reilly Media

Data Viz Revision: O’Reilly 2016 Data Science Salary Survey (Part 3)

This post is part of a series based on the data displayed in O’Reilly’s 2016 Data Science Salary Survey. Using the Data Chefs Revision Organizer as a guide, we will rethink and revise some of the visualizations featured in the report.

In this visualization, the authors are trying to show the proportion of  survey respondents based on their location in specific regions of the world:

oreilly-world-region-original

The blue circles do not depict the underlying data in this map, as they did in the visualizations from the first two posts in this series.  Instead, the blue bubbles here are merely a stylistic choice: they serve as pixels representing the world’s land mass. The numeric values are then laid on top of their corresponding regions.

It’s important to note that while all the categories are regional, the units vary. Sometimes they refer to countries (e.g., the United States, Canada), sometimes to entire continents (e.g., Africa, Asia), and sometimes to vague regional groupings (e.g. Latin America). Given the inconsistency in the data categories, it’s no surprising that the visualization is a little unclear too.

One of the problems with this visualization is that the values are represented as numbers, so the reader does not immediately notice the difference between the size of the values.  If you move back a little bit or squint your eyes until you can’t quite read the exact values, there’s nothing that immediately distinguishes the highest value (United States) and the lowest (Africa). Both appear to be white text that takes up roughly the same amount of space on a blue grid.

As I considered how to revise this map, my first thought was to try to salvage the blue bubble theme by using blue bubbles sized based on the values and placed over a geographic map.  Here’s a mockup I did using carto:

oreilly-world-region-excel-map

And here’s one I did using PowerBI:

oreilly-world-region-powerbi-map

While you can immediately see the size difference in values on these revisions, this type of map still has the same issue as the original, namely, confusion caussed by inconsistent geographic categories.  What countries constitute “Latin America,” for instance? If we assume that a number of the Caribbean island nations are part of Latin America, then it seems a little weird that the value is placed in the middle of South America.  Using another example, respondents from Iceland probably fall under Europe/non-UK, but there’s a disconnect (literally), because the  value bubble is all the way in mainland Europe.

There’s also a secondary problem that arise from the limitations of the tools I used: PowerBI and carto. If you look in my examples, the bubbles are not sized consistently.  In both tools, it’s difficult to make bubble maps in which the size of the circles accurately reflect area, not diameter.  For these reasons, I ruled out the bubble map.

Next, I considered a part/whole visualization, like the ones in part 2, but the fact that there are eight distinct categories, and some of the values are relatively small, I knew that there would be issues seeing the smaller values and their labels.

So, ultimately, I settled on this revision:

oreilly-world-region

 

 

data-chefs-viz-revision-organizer-oreilly-world-region

It’s just a simple bar chart, with values ranked from highest to lowest.  The benefit of using this simple graph, rather than the map, is that it elimiates the confusion caused by the inconsistent units of the regional categories. Now, because we don’t see every country on this chart, we don’t worry about it.

This may not be as visually appealing as the original, but, sometimes, the simplest solution is the best solution.

Data Viz Revision: O’Reilly 2016 Data Science Salary Survey (Part 2)

This post is part of a series based on the data displayed in O’Reilly’s 2016 Data Science Salary Survey. Using the Data Chefs Revision Organizer as a guide, we will rethink and revise some of the visualizations featured in the report.

In this post, I want to focus on the visualization for the share of survey respondents by self-reported age category:

oreilly-age

Again, the authors used the arcing blue circle theme to depict the breakdown by age category.  On the plus side, the data labels are consistently placed, all falling along the bottom-right of each value circle (or the inside of the arc), and the order is intuive: youngest to oldest. Also, the circles appear to be sized properly by area (as opposed to diameter).

Using circles is not necessarily a bad way to depict category data, but doing so has some limitations. The main drawback is that by using distinct circles, you lose the relation of each part to the whole.

 

data-chefs-viz-revision-organizer-oreilly-age

For this data, I propose using a form of visualization in which the part/whole relationship is central: pie chart, donut chart, waffle chart, or stacked 100% bar chart, shown below:

oreilly age revisions.png

The biggest downside to using these part/whole visualizations is that there isn’t a lot of room to label smaller values.  For that reason, I created a legend for all the values in each graph.

And, although this isn’t a problem with the visualization itself,  if you pay attention to the values in the original, you’ll see that they add up to greater than 100%: 101% to be exact. What probably happened is that more than one value was rounded up, giving the total an extra full percent.  In my revisions, I changedthe value for the 41-50 category, from 16% to 15% so that the values would sum to 100%. This was a compltely arbitrary choice because I had no access to the raw data to know exactly how they were rounded.

I think any one of these would work in place of the original.  Thoughts?

 

 

 

Data Viz Revision: O’Reilly 2016 Data Science Salary Survey (Part 1)

We will be posting a series based on the data displayed in O’Reilly’s 2016 Data Science Salary Survey. Using the Data Chefs Revision Organizer as a guide, we will rethink and revise some of the visualizations featured in the report.

 

I recently read  O’Reilly’s 2016 Data Science Salary Survey (by John King & Roger Magoulas). People who worked in the field of Data Science answered questions about their job titles, age, salaries, tools, tasks, etc., and this report summarized the results.  I thought the report offered a pretty fascinating overview of the data science industry, and is definitely worth the read.

However, I was a little thrown off by the choices the authors made in visualizing the data.  Here is a selection of representative pages:

oreilly-circle-theme

As you can see, King & Magoulas opted to use a series of blue circles to represent the data throughout the report.  While the circles provide a common visual theme, I don’t think they best represent this particular data.

One example is the visualization for tasks: work activities in which the data science survey respondents reported major engagement:

oreilly-tasks

The values are displayed as circle areas, sorted from highest to lowest, starting from bottom-left and curving clockwise around to the bottom-middle.  The relative sizes of the circle areas seem to be accurate., but notice the positioning of the labels on the circles.  From 69% down through 36%, the data and category labels are consistently positioned to the right of each circle.  From 32% on down, the data label placement starts to get inconsistent: left sometimes, right other times, based on space constraints.

This space constraint also forces the authors to alter the positioning of the value circles.  In order to fit the long text of the categories, the bottom right side of the arc had to be squashed. This gives the visualization an odd, bean-like shape.

 

data-viz-revision-organizer-oreilly-tasks

The revision I’ve proposed, a horizontal bar chart, is a lot cleaner. The data labels are consistent: categories to the left of the bars, values to the right.  Also, the relative sizes of the bars are pretty clear. That’s not really the case with the circle values.

oreilly-tasks-revision

This bar chart may lack the novelty or the visual pop of the original, but I think it’s more appropriate for the data, and far easier to understand.

What do you think?