Looking for State Level Trends
Have you ever wondered who clicks on digital ads? Is it people who live in the most tech savvy, modern cities? Is it someone with too much time on their hands in a remote corner of North Dakota? Do idle hands lead to higher interactions in ads? Team Garlic wanted to know and their Hack Day goal was to find to out more about who clicks on Flite ads by using our own data combined with open data on census.gov.
The plan was to gather the smallest, but most relevant data set as quickly as possible, see what it told us, and show it off. Ike and Angel mined our own data in Hive for the top impressions, interactions, and browser by city and state. Meanwhile, Andy and Matt decided to see what kind census data we could get from census.gov regarding population, income, gender, and employment rate. Once we had all of that info, Ike was going to merge the data in MySQL, Angel would then analyze it for trends, and Andy would build a page to display the data.
Before we went out and started gathering data, we decided to form a hypothesis about correlations between census data and Flite usage data. The best idea we could come up with based on state-level data was that a higher unemployment rate may correlate with a higher time on unit, engagement rate, and/or interaction rate. Read on to find out if our hypothesis was valid.
During the data gathering processed we discovered a few interesting things and ran into a few hiccups. For example, the team discovered that West Virginia has the highest usage of Internet Explorer on Flite ads. The lowest: Utah. (Utah, you’re awesome. Thank you.)
We also discovered that the data on census.gov wasn’t always easy to procure the way we wanted it. One on hand making an API call to the 2010 census data was straightforward and fast. Matt was able to get a unique key to make the calls within minutes and was running our first queries in less than 15 minutes. On the other hand, the API calls to the American Community Survey (ACS) 5 Year Data Set was much more difficult and we never made a successful custom query. For the most part this was due to the lack of clear documentation on the census.gov site.
Since we needed some additional information and were unable to get it from ACS we went looking elsewhere. When in doubt, look it up on the “The Google,” right? Andy found a couple of pre-populated pages on bls.gov and census.gov for unemployment numbers and median state income. We weren’t able to easily export the data so Matt just hacked together his own CSV files and sent them to Angel.
We had a little more time and thought that gathering population information by city in the US would be fun to look at as well. Fortunately, Ike found this population data fairly quickly from biggestuscities.com which linked directly to the relevant data set on the census site. Unfortunately the city/town names in that data set did not match up well with our city-level Flite usage data, and we didn’t have enough time to clean the data set before demos.
So what did we find? First, although humorous, we tossed out the browser data because we didn’t have time to integrate it. Matt really dislikes IE so it was best not to go down that rat hole.
Looking at the maps comparing Flite usage data to population data, the first thing we calculated was daily Flite ad impressions per 10,000 residents. That map didn’t look very interesting because we didn’t see much variation between the states. When we looked more closely at the data set we found that Washington, DC (which we did not highlight on the map) had by far the highest per capita impression rate, about 300% of the second place state. Ike’s theory is that more people work in DC than actually live there. We found a blog post that supported our theory about DC’s high employment/residence ratio, higher than any other US city. And unlike other US cities where the effect is dilluted over an entire state’s population, Washington, DC is both a city and a state for our purposes.
Keeping it political, Ohio was a true battleground state in the 2012 election, but it’s the clear winner when it comes to Clickthrough Rate. We reached out to our in-office Ohio expert, John Skinner, but received no comment.
Engagement Rate and Interaction Rate by state also had some interesting variations, but we couldn’t come with any useful theories about the cause:
Finally, and the truly most interesting part of this project is that Angel discovered that that higher the income rate, the lower an ad’s interaction rate ended up being. We compared all of the different factors, and found two strong correlations.
The first was that interaction rate and median income were inversely proportional:
Secondly, following up on our initial hypothesis we found a positive correlation between engagement rate and unemployment rate:
This data supported our initial hypothesis to some extent. Of course there are many caveats working with state level data. For example, many ad campaigns are regional, so some of the data trends may be based more on the ads themselves rather than the people interacting with them. Also, the “unemployment rate” we used was not the true unemployment rate, but rather we divided the census figure for unemployed persons by the census population figure. By the time we realized that error it was too late to re-run the data with official unemployment rate figures. Nonetheless, it was nice to find some relationship within our data set!