Archives

Tag: Data Viz

  • Our Inflated Thanksgiving, 2016 Edition

    It’s not just the balloons at the Macy’s Thanksgiving Day Parade that are inflated.

    Each year the American Farm Bureau Federation collects an informal price survey for a typical Thanksgiving dinner. The shopping list has remained the same for the past 31 years: turkey, bread stuffing, sweet potatoes, rolls with butter, peas, cranberries, a veggie tray, pumpkin pie with whipped cream, and coffee and milk, all to feed a family of 10.

    At first glance, it seems like the price of food has trended upward since 1986. Once you adjust for inflation, however, you get a different story. The inflation-adjusted price has stayed relatively flat over time with minor fluctuations. The purchasing power of our currency in circulation has gone down over time, driving up the price of goods.

    As Mark Perry points out, this is no reason for despair. The average hourly wage of the American worker has increased steadily, making Thanksgiving more affordable than ever. That is something to be thankful for this year.

    The data for this chart was collected by the American Farm Bureau Federation. I adjusted this price data for inflation using the BLS’s CPI Inflation Calculator. The chart was built using Chart.js.

    I retooled my original 2014 version. I paid closer attention to colors and made this one responsive. I think it is a big improvement!

    This was also published over at FEE’s Anything Peaceful blog.

  • Praxis Data Workshop – Telling a Stronger Story with Data Visualization


    On October 13 I ran a data-related workshop for the Praxis community.

    I split the talk up into two parts:

    1. How you can apply the basic concepts of data analysis to your career, whether it be sales, marketing, shipping, coaching, cooking, management, etc. These techniques will help you tell a better story, be more convincing, and become more effective at your job.
    2. An introduction to visualizing data and information, including some tools, resources, and inspiration to get you started. If you are interested in data visualization, think of this as a jumping-off point.

    The Slides

    Praxis Data Workshop - Telling a Stronger Story with Data Visualization

    Download my slides without my talking notes ↓

    Download my slides with my talking notes ↓

    The Video

    Thanks to TK Coleman for asking me to do the worksop and for recording the video!

    Examples mentioned

    Resources mentioned

    Books Mentioned

    Tools

    • Excel – Basic, but a great first stop for exploring and sorting your data set.
    • Vega, Lyra, and Voyager – Open source data tools for exploration and visualization
    • Tableau Public – Free version of Tableau, one of the top, easy-to-use visualization software on the market
    • RAW – Makes using complex D3 chart types easy to use.
    • Carto – Free maps without needing coding knowledge!
    • Chart.js – Easy, pre-packaged interactive chart types. Need to know some JavaScript.
    • D3.js – One of the top DOM manipulation/data visualization libraries
    • Paper.js – An open source vector graphics scripting framework
    • R – Popular data science language
    • Python’s Pandas – Popular data science library for Python
    • Matplotlib – Popular graphing library for Python
    • Reporter app – Personal data collection app

    PDP ideas

    Here are a few ideas on how Praxis participants can incorporate data analysis and visualization into their PDPs (personal development plans):

    1. Read How To Measure Anything, Lean Analytics, or How to Lie with Statistics and write a review on both Amazon and your blog.
    2. Pick 1-2 metrics in your personal life and collect data on them for two weeks. After two weeks, write a blog post explaining some trends you see, explaining the outliers, and visualizing the data in at least two ways. A few ideas: Sleep time and quality, caffeine intake and focus, time spent on social media, counts of something you produce at work, pages of books read, how many times you say “thank you” to someone, how many blog posts you write, etc. Check Dear Data for ideas. Reporter app is useful for collection.
    3. Pick 1-2 metrics that your business partner is currently tracking. Analyze them and write up the results. Make a presentation for your managers.
    4. Find a data set of any size and write a blog post framing it in three different ways.
    5. Get sales or marketing data (or something that includes location data related to your business) from your business partner and map it with Carto. Make it private so you aren’t sharing private data. Write a blog post about what you found.
    6. Research 3 different chart types and write explanatory blog posts on how they are used and give some examples of what to visualize with them.
    7. Identify something you wanted to track but didn’t and find a proxy for it.
    8. Pick something that interests you and build a small visualization with Chart.js or D3.js. Write a blog post explaining what is interesting. Guide us through it.
    9. Dig in to an open data set, find some outliers, and frame some comparisons. Write it up in a blog post.
    10. Dig in to an open data set and make a visualization for a small, interesting facet of it. Write it up in a blog post.
    11. For sales people: Does one region do spectacularly well or spectacularly poorly? Dig in to why and write it up.
    12. For marketers: Are certain channels performing better than others? Dig in to why and write it up.
    13. Look into your business partner’s revenue over time. Do you find any interesting trends? Do some research and write them up.
    14. Come up with an idea on what your team can track over the next few months to improve your efficiency/deliverables. Pitch the team on what to do, why they need to do it, and how you will use the data at the end of the quarter.
  • Using Word Frequency Charts for Better Word Clouds


    Word clouds

    Data scientists notoriously hate word clouds. Besides for figuring out what the top 2-3 words are (because they are the biggest), it is difficult to see how much one word is used relative to another. Unfortunately, clients and non-data people love word clouds and sometimes insist on them. What is a self-respecting data nerd to do?

    Pair it a word frequency chart!

    The easiest way to do this is by using Python’s counter:

    Counter(words).most_common()

    Then you can use your favorite charting tool to make a bar chart of the results. I prefer D3.js.

    Results

    Word Frequency Chart

    Word Cloud

    If you see both together, you get a better understanding of the words being used. Of course, a single word doesn’t always capture sentiment. They can be helpful in smaller data sets, but sometimes common phrases are more helpful in larger data sets. For common phrases, use n-gram analysis.

    For more on visualizing text, check out episode 62 of the Data Stories podcast and the Text Visualization Browser.

  • Photo Metadata Analysis Project

    I’m working my way through some data science and visualization books right now. I found that I learn better by doing small projects than I do by copying examples in books, so I designed a little project to apply some of what I learned and to learn some new skills along the way.

    My goal was to make a project where I do everything from start to finish: Create my own data set, format it, analyze it, then visualize it. I also wanted to make it fairly common and as automated as possible so it could be repeated by others.

    Here is what I came up with: Extracting metadata from my iPhone photos, analyzing it in different ways (days, months, hours, seasons), and visualizing it. I used free tools to do the extraction, formatting, and visualization, then scripted everything with AppleScript and Python to automate it.

    Technical Details

    You can find the full repository on GitHub. If you have a recent Mac with Photos.app and TextWrangler, you can run the scripts and produce your own charts!

    • I used AppleScript to loop through photo metadata in Photos.app and write it out to a CSV files that it creates in the same directory as the scripts.
    • I used TextWrangler’s grep functionality via AppleScript to break apart the date strings into days of the week, dates, and times, and to remove bad or null location strings (lat,long). I know I could have written this in Python, but I didn’t want to reinvent the wheel. TextWrangler’s AppleScript library is very powerful and easy to use.
    • Python was my tool of choice for analyzing the CSV files in various ways and visualizing the results with one of its plotting libraries, matplotlib.
    • The map of where photos were taken in the US was generated with D3.js.
    • Once everything has been generated and saved, AppleScript opens the images in Preview and launches a simple Python webserver to show the map.

    The Results

    Photos by hour block

    AM vs PM percentage

    Months count breakdown

    Seasons count breakdown

    Seasons percentages

    Weekday count breakdown

    Weekday percentages

    Photo Map

    Lessons

    • I really beefed up my understanding of basic Python (with help from Eric Davis!)
    • I dusted off my AppleScript knowledge and gave it a workout. I learned that AppleScript has a concept of lists that you can pass into and out of programs. This was the key to launching all of the charts in a single preview window.
    • This was a great exercise in UX. How can this be both easy to use and easy to interpret?
    • This was an exercise in thinking programatically. How can this be built in a way that makes it reusable?
    • I learned how to project location information onto a map with D3.js. I’ve used D3.js for charts before, so this was a good way to expand my skills.
    • This was a good way to practice my git skills and think through how to structure a project and make executable code.
    • There is a lot more I can add to this (more mapping options, more ways to count the photos, outputting the photos in a calendar heatmap), but I feel comfortable stopping and moving on because I learned what I wanted to from it and I’m ready to start a new project. I might come back to this in the future and I might not, but either way I’m happy with this.

    Cavaets

    • This analysis is not scientific, it was for fun. Since my photos were taken with different cell phones across non-controlled time periods, I can’t use this analysis to say things like, “I’m more likely to take photos in the spring than the fall.” The truth is that there are photos in here from three springs but only two falls.
    • I can’t guarantee that my code will work for everyone. It is still a little buggy and hasn’t been tested for all scenarios. I know this and know how I would test it, but this project isn’t big enough to warrant it.
    • The color palettes I used aren’t bulletproof. If you use F.lux or Night Shift the yellows will blend in to the screen, and if you have visual impairments you might not be able to distinguish between the greens and blues.

    Try it for yourself

    You can download the repository from GitHub and run it against your Photos.app library. The requirements and instructions are in the README. Let me know if you have any issues and I’ll do my best to help you out.

  • Responsive D3.js bar chart with labels


    Hey, so this post is broken. I moved platforms and some of my old tutorials don’t play nicely with WordPress. I’m working on fixing them, but in the meantime you can view the old version here: https://cagrimmett-jekyll.s3.amazonaws.com/til/2016/04/26/responsive-d3-bar-chart.html

    Today I learned some cool stuff with D3.js!

    Here is a minimalist responsive bar chart with quantity labels at the top of each bar and text wrapping of the food labels. It is actually responsive, it doesn’t merely scale the SVG proportionally, it keeps a fixed height and dynamically changes the width.

    For simplicity I took the left scale off. All bars are proportional and are labeled anyway.

    Go ahead and resize your window! This has a minimum width of about 530px because of the text labels. Any smaller than that and they are very difficult to read.

    The basic HTML

     id ="chartID"> 

    The Styles

    You’ll see that the axis is actually there but it is white. I found it useful to learn to draw it, but I didn’t want it so I am keeping it hidden.

    .axis path, .axis line {     fill: none;     stroke: #fff;   } .axis text {   	font-size: 13px;   } .bar {     fill: #8CD3DD;   } .bar:hover {     fill: #F56C4E;   } svg text.label {   fill:white;   font: 15px;     font-weight: 400;   text-anchor: middle; } #chartID { 	min-width: 531px; }

    The Data

    var data = [{"food":"Hotdogs","quantity":24},{"food":"Tacos","quantity":15},{"food":"Pizza","quantity":3},{"food":"Double Quarter Pounders with Cheese","quantity":2},{"food":"Omelets","quantity":30},{"food":"Falafel and Hummus","quantity":21},{"food":"Soylent","quantity":13}]

    The Javascript Heavy Lifting

    This is where D3 really comes in.

    1. Setting the margins, sizes, and figuring out the basic scale.
    2. Setting the axes
    3. Drawing the basic SVG container with the proper size and margins
    4. Scaling the axes
    5. Drawing the bars themselves
    var margin = {top:10, right:10, bottom:90, left:10};  var width = 960 - margin.left - margin.right;  var height = 500 - margin.top - margin.bottom;  var xScale = d3.scale.ordinal().rangeRoundBands([0, width], .03)  var yScale = d3.scale.linear()       .range([height, 0]);   var xAxis = d3.svg.axis() 		.scale(xScale) 		.orient("bottom");               var yAxis = d3.svg.axis() 		.scale(yScale) 		.orient("left");  var svgContainer = d3.select("#chartID").append("svg") 		.attr("width", width+margin.left + margin.right) 		.attr("height",height+margin.top + margin.bottom) 		.append("g").attr("class", "container") 		.attr("transform", "translate("+ margin.left +","+ margin.top +")");  xScale.domain(data.map(function(d) { return d.food; })); yScale.domain([0, d3.max(data, function(d) { return d.quantity; })]);   //xAxis. To put on the top, swap "(height)" with "-5" in the translate() statement. Then you'll have to change the margins above and the x,y attributes in the svgContainer.select('.x.axis') statement inside resize() below. var xAxis_g = svgContainer.append("g") 		.attr("class", "x axis") 		.attr("transform", "translate(0," + (height) + ")") 		.call(xAxis) 		.selectAll("text"); 			 // Uncomment this block if you want the y axis /*var yAxis_g = svgContainer.append("g") 		.attr("class", "y axis") 		.call(yAxis) 		.append("text") 		.attr("transform", "rotate(-90)") 		.attr("y", 6).attr("dy", ".71em") 		//.style("text-anchor", "end").text("Number of Applicatons");  */   	svgContainer.selectAll(".bar")   		.data(data)   		.enter()   		.append("rect")   		.attr("class", "bar")   		.attr("x", function(d) { return xScale(d.food); })   		.attr("width", xScale.rangeBand())   		.attr("y", function(d) { return yScale(d.quantity); })   		.attr("height", function(d) { return height - yScale(d.quantity); });

    Adding the quantity labels to the top of each bar

    This took me a while to figure out because I was originally appending to the rect element. According to the SVG specs this is illegal, so I moved on to appending them after everything else to they’d show on top. The positioning is tricky, too. I eventually found the correct variables to position it close to center. Then text-anchor: middle; sealed the deal.

    // Controls the text labels at the top of each bar. Partially repeated in the resize() function below for responsiveness. 	svgContainer.selectAll(".text")  		 	  .data(data) 	  .enter() 	  .append("text") 	  .attr("class","label") 	  .attr("x", (function(d) { return xScale(d.food) + xScale.rangeBand() / 2 ; }  )) 	  .attr("y", function(d) { return yScale(d.quantity) + 1; }) 	  .attr("dy", ".75em") 	  .text(function(d) { return d.quantity; });   	  

    Responsiveness

    The general method for making D3 charts responsive is to scale the SVG down proportionally as the window gets smaller by manipulating the viewBox and preserveAspectRatio attributes. But after digging around on Github for a while, I found a fancier solution that preserves the height and redraws the SVG as the width shrinks.

    document.addEventListener("DOMContentLoaded", resize); d3.select(window).on('resize', resize);   function resize() { 	console.log('----resize function----');   // update width   width = parseInt(d3.select('#chartID').style('width'), 10);   width = width - margin.left - margin.right;    height = parseInt(d3.select("#chartID").style("height"));   height = height - margin.top - margin.bottom; 	console.log('----resiz width----'+width); 	console.log('----resiz height----'+height);   // resize the chart        xScale.range([0, width]);     xScale.rangeRoundBands([0, width], .03);     yScale.range([height, 0]);      yAxis.ticks(Math.max(height/50, 2));     xAxis.ticks(Math.max(width/50, 2));      d3.select(svgContainer.node().parentNode)         .style('width', (width + margin.left + margin.right) + 'px');      svgContainer.selectAll('.bar')     	.attr("x", function(d) { return xScale(d.food); })       .attr("width", xScale.rangeBand());           svgContainer.selectAll("text")  		 	 // .attr("x", function(d) { return xScale(d.food); }) 	 .attr("x", (function(d) { return xScale(d.food	) + xScale.rangeBand() / 2 ; }  ))       .attr("y", function(d) { return yScale(d.quantity) + 1; })       .attr("dy", ".75em");   	            svgContainer.select('.x.axis').call(xAxis.orient('bottom')).selectAll("text").attr("y",10).call(wrap, xScale.rangeBand());     // Swap the version below for the one above to disable rotating the titles     // svgContainer.select('.x.axis').call(xAxis.orient('top')).selectAll("text").attr("x",55).attr("y",-25);     	     }

    Wrapping text labels

    Wrapping text labels is tricky. The best solution I found is the one Mike Bostock (D3’s creator) describes. I modified it slightly to work with my chart, but the overall solution is the same.

    function wrap(text, width) {   text.each(function() {     var text = d3.select(this),         words = text.text().split(/s+/).reverse(),         word,         line = [],         lineNumber = 0,         lineHeight = 1.1, // ems         y = text.attr("y"),         dy = parseFloat(text.attr("dy")),         tspan = text.text(null).append("tspan").attr("x", 0).attr("y", y).attr("dy", dy + "em");     while (word = words.pop()) {       line.push(word);       tspan.text(line.join(" "));       if (tspan.node().getComputedTextLength() > width) {         line.pop();         tspan.text(line.join(" "));         line = [word];         tspan = text.append("tspan").attr("x", 0).attr("y", y).attr("dy", ++lineNumber * lineHeight + dy + "em").text(word);       }     }   }); }

  • Steph Curry’s Advantage (Or How to Become a Leader in the NBA)

    I don’t really follow basketball. But since I’ve been hearing a lot about this Steph Curry guy in the news, on Facebook, and on podcasts, I decided to look into his stats. And since I’m trying to teach myself about data science and visualization, I thought I’d visualize some of his stats to see what I could learn.

    From what I could tell Steph Curry had a decent start to his career, but while his was above-average, he didn’t seem to be a star. Then after he got injured during the 2011-12 season, he must have had a revelation because he came back the next season and made a name for himself.

    The next season he started taking radically more three point shots and it paid off for him. His overall points went up as he made more three point attempts:

    Here are the number of three point attempts vs three point shots made by season with the circles scaled by the number of overall points he made that season:

    01002003004005006007000501001502002503002009201020112012201320142015

    Even though he took more shots, his overall shooting percentage and percentage on three-pointers has increased recently, but not dramatically. It has fluctuated over his career:

    Many current players have him beat on overall shooting percentage and a few guys are rivaling him on three point percentage (Jason Kapono, Steve Novak, Kyle Korver.)

    So if you can’t stand out by being more accurate than everyone else, what can you do? In a game that is driven by the final overall score instead of percentages, being about as accurate as everyone else but willing to shoot more pays off. Curry is willing to take more long shots than anyone else in the league, by far.

    Here are Curry’s 3 point attempts vs 3 point shots made compared with the other top shooters in the league for the past four years:

    480500520540560580600620640660680700180200220240260280300320Damian Lillard, PGDamian Lillard, PGDamian Lillard, PGDamian Lillard, PGGerald Green, SFJames Harden, SGJames Harden, SGJames Harden, SGKlay Thompson, SGKlay Thompson, SGKlay Thompson, SGKlay Thompson, SGPaul George, SFRyan Anderson, PFStephen Curry, PGStephen Curry, PGStephen Curry, PGStephen Curry, PGTrevor Ariza, SFWesley Matthews, SG

    2012 2013 2014 2015

    Here is the same data visualized as a bump chart. You can see the other top shooters jostling positions, but Curry has been king for the past four years:

    Winter 2012Spring 2013Winter 2013Spring 2014Winter 2014Spring 2015Winter 2015Damian Lillard, PGGerald Green, SFJames Harden, SGKlay Thompson, SGPaul George, SFRyan Anderson, PFStephen Curry, PGTrevor Ariza, SFWesley Matthews, SG

    I realize that there is more to the game than just taking more shots. You need incredible talent and skill to keep up a percentage like Curry’s from anywhere on the court. Given his success, I’m willing to bet that we’ll see more players trying to emulate his approach next season.


    Side note: James Harden seems to be using the same strategy, except with two point shots and free throws. It is paying off for him, too. He is currently the number two leader in the NBA, right behind Steph Curry.

    Sources:

    Tools:

  • Our Inflated Thanksgiving


    For the past 29 years, the American Farm Bureau Federation has conducted an informal survey of the price of a classic Thanksgiving dinner for 10 people. At first glance, it looks like the price of food has been steadily rising. But when you adjust the numbers for inflation, you get a different story. It isn’t the cost of our food that has been rising, but the amount of US currency in circulation.

    This year, we’re thankful for technologies like bitcoin breaking the Federal Reserve’s grip on money.

    Happy Thanksgiving!

    The individual prices of a traditional Thanksgiving dinner this year: