Reshaping the Tools to Fit Our Communities

When I first started learning pandas, I spent way too much time feeling like an idiot.

Some parts of pandas were a snap to pick up, and they let you easily slice and dice data with a few simple commands. But other parts of pandas could be maddeningly difficult or just plain bizarre. And my frustration was compounded by the fact that like a lot of people who need a tool like pandas, I only had an hour here and there in the middle of my daily grind to learn it.

The documentation had similar problems.  Some of it was really well written.  But sometimes figuring out the basics I needed to do my work felt like a game of Marco Polo.

After about six months of playing with pandas, I figured out what was going on: pandas was great, but it wasn’t designed with folks like me in mind.

Take how pandas handles time. pandas has elegant, powerful commands for handling what’s called “time series” data – data like stock market data where you get share prices at regular intervals of time. But when it came to the dates I usually worked with — sales dates, membership dates — pandas was like the DMV on acid. Want to get the year of the sales purchase? Here’s what you need to do:

sales.year = sales.purchase_date.apply(lambda x: x.year)

Ick!

(A quick nerdy aside. Complaint about this to a pandanista and odds are they’ll say, “if you don’t like this approach, just convert your sales data into timeseries data by making sales.purchase_date into sales’ index.” To which I say: I rest my case.)

Was this some bizarre form of coding masocism — call it Fifty Shades Of Pandas? No. The pandas community was incredibly nice and tried to be as helpful as possible to folks who were flailing. The problem is that pandas and the tools it was built on – e.g. numpy — were designed around the needs of the people who were building it, who were mostly financial quants and scientists. Quants and scientists mostly use time series. Folks like me who work for nonprofits? Not so much.

And this brings me to one of the main points of Data Chefs. A lot of really good people are trying to figure out how to make it easier for folks in the community to learn how to use the tools we have. We think that’s a good thing. But we also think it’s time to work in the other direction. We need to start making the tools easier for more folks to use — not by dumbing them down but by redesigning them with a different audience in mind.

In the long run, what Data Chefs wants to accomplish is crazy ambitious. I used to work at SEIU, where I worked with union locals on data issues, and eventually I want janitors who are active in their local to be able to use tools like pandas to understand and act on the world around them. That’s not going to happen overnight.

But in the meantime, we’ve got plenty of low hanging fruit to pick. There’s no reason why working with dates in pandas can’t be easy for folks in the nonprofit community, and fixing it isn’t rocket science. But that won’t happen unless the people who aren’t currently part of the communities that are building tools like pandas start making our voices heard.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s