It’s not just the balloons at the Macy’s Thanksgiving Day Parade that are inflated.
Each year the American Farm Bureau Federation collects an informal price survey for a typical Thanksgiving dinner. The shopping list has remained the same for the past 31 years: turkey, bread stuffing, sweet potatoes, rolls with butter, peas, cranberries, a veggie tray, pumpkin pie with whipped cream, and coffee and milk, all to feed a family of 10.
At first glance, it seems like the price of food has trended upward since 1986. Once you adjust for inflation, however, you get a different story. The inflation-adjusted price has stayed relatively flat over time with minor fluctuations. The purchasing power of our currency in circulation has gone down over time, driving up the price of goods.
As Mark Perry points out, this is no reason for despair. The average hourly wage of the American worker has increased steadily, making Thanksgiving more affordable than ever. That is something to be thankful for this year.
On October 13 I ran a data-related workshop for the Praxis community.
I split the talk up into two parts:
How you can apply the basic concepts of data analysis to your career, whether it be sales, marketing, shipping, coaching, cooking, management, etc. These techniques will help you tell a better story, be more convincing, and become more effective at your job.
An introduction to visualizing data and information, including some tools, resources, and inspiration to get you started. If you are interested in data visualization, think of this as a jumping-off point.
Pick 1-2 metrics in your personal life and collect data on them for two weeks. After two weeks, write a blog post explaining some trends you see, explaining the outliers, and visualizing the data in at least two ways. A few ideas: Sleep time and quality, caffeine intake and focus, time spent on social media, counts of something you produce at work, pages of books read, how many times you say “thank you” to someone, how many blog posts you write, etc. Check Dear Data for ideas. Reporter app is useful for collection.
Pick 1-2 metrics that your business partner is currently tracking. Analyze them and write up the results. Make a presentation for your managers.
Find a data set of any size and write a blog post framing it in three different ways.
Get sales or marketing data (or something that includes location data related to your business) from your business partner and map it with Carto. Make it private so you aren’t sharing private data. Write a blog post about what you found.
Research 3 different chart types and write explanatory blog posts on how they are used and give some examples of what to visualize with them.
Identify something you wanted to track but didn’t and find a proxy for it.
Pick something that interests you and build a small visualization with Chart.js or D3.js. Write a blog post explaining what is interesting. Guide us through it.
Dig in to an open data set, find some outliers, and frame some comparisons. Write it up in a blog post.
Dig in to an open data set and make a visualization for a small, interesting facet of it. Write it up in a blog post.
For sales people: Does one region do spectacularly well or spectacularly poorly? Dig in to why and write it up.
For marketers: Are certain channels performing better than others? Dig in to why and write it up.
Look into your business partner’s revenue over time. Do you find any interesting trends? Do some research and write them up.
Come up with an idea on what your team can track over the next few months to improve your efficiency/deliverables. Pitch the team on what to do, why they need to do it, and how you will use the data at the end of the quarter.
Data scientists notoriously hate word clouds. Besides for figuring out what the top 2-3 words are (because they are the biggest), it is difficult to see how much one word is used relative to another. Unfortunately, clients and non-data people love word clouds and sometimes insist on them. What is a self-respecting data nerd to do?
Pair it a word frequency chart!
The easiest way to do this is by using Python’s counter:
Then you can use your favorite charting tool to make a bar chart of the results. I prefer D3.js.
Results
If you see both together, you get a better understanding of the words being used. Of course, a single word doesn’t always capture sentiment. They can be helpful in smaller data sets, but sometimes common phrases are more helpful in larger data sets. For common phrases, use n-gram analysis.
I’m working my way through some data science and visualization books right now. I found that I learn better by doing small projects than I do by copying examples in books, so I designed a little project to apply some of what I learned and to learn some new skills along the way.
My goal was to make a project where I do everything from start to finish: Create my own data set, format it, analyze it, then visualize it. I also wanted to make it fairly common and as automated as possible so it could be repeated by others.
Here is what I came up with: Extracting metadata from my iPhone photos, analyzing it in different ways (days, months, hours, seasons), and visualizing it. I used free tools to do the extraction, formatting, and visualization, then scripted everything with AppleScript and Python to automate it.
Technical Details
You can find the full repository on GitHub. If you have a recent Mac with Photos.app and TextWrangler, you can run the scripts and produce your own charts!
I used AppleScript to loop through photo metadata in Photos.app and write it out to a CSV files that it creates in the same directory as the scripts.
I used TextWrangler’s grep functionality via AppleScript to break apart the date strings into days of the week, dates, and times, and to remove bad or null location strings (lat,long). I know I could have written this in Python, but I didn’t want to reinvent the wheel. TextWrangler’s AppleScript library is very powerful and easy to use.
Python was my tool of choice for analyzing the CSV files in various ways and visualizing the results with one of its plotting libraries, matplotlib.
The map of where photos were taken in the US was generated with D3.js.
Once everything has been generated and saved, AppleScript opens the images in Preview and launches a simple Python webserver to show the map.
The Results
Lessons
I really beefed up my understanding of basic Python (with help from Eric Davis!)
I dusted off my AppleScript knowledge and gave it a workout. I learned that AppleScript has a concept of lists that you can pass into and out of programs. This was the key to launching all of the charts in a single preview window.
This was a great exercise in UX. How can this be both easy to use and easy to interpret?
This was an exercise in thinking programatically. How can this be built in a way that makes it reusable?
I learned how to project location information onto a map with D3.js. I’ve used D3.js for charts before, so this was a good way to expand my skills.
This was a good way to practice my git skills and think through how to structure a project and make executable code.
There is a lot more I can add to this (more mapping options, more ways to count the photos, outputting the photos in a calendar heatmap), but I feel comfortable stopping and moving on because I learned what I wanted to from it and I’m ready to start a new project. I might come back to this in the future and I might not, but either way I’m happy with this.
Cavaets
This analysis is not scientific, it was for fun. Since my photos were taken with different cell phones across non-controlled time periods, I can’t use this analysis to say things like, “I’m more likely to take photos in the spring than the fall.” The truth is that there are photos in here from three springs but only two falls.
I can’t guarantee that my code will work for everyone. It is still a little buggy and hasn’t been tested for all scenarios. I know this and know how I would test it, but this project isn’t big enough to warrant it.
The color palettes I used aren’t bulletproof. If you use F.lux or Night Shift the yellows will blend in to the screen, and if you have visual impairments you might not be able to distinguish between the greens and blues.
Try it for yourself
You can download the repository from GitHub and run it against your Photos.app library. The requirements and instructions are in the README. Let me know if you have any issues and I’ll do my best to help you out.
Here is a minimalist responsive bar chart with quantity labels at the top of each bar and text wrapping of the food labels. It is actually responsive, it doesn’t merely scale the SVG proportionally, it keeps a fixed height and dynamically changes the width.
For simplicity I took the left scale off. All bars are proportional and are labeled anyway.
Go ahead and resize your window! This has a minimum width of about 530px because of the text labels. Any smaller than that and they are very difficult to read.
The basic HTML
The Styles
You’ll see that the axis is actually there but it is white. I found it useful to learn to draw it, but I didn’t want it so I am keeping it hidden.
The Data
The Javascript Heavy Lifting
This is where D3 really comes in.
Setting the margins, sizes, and figuring out the basic scale.
Setting the axes
Drawing the basic SVG container with the proper size and margins
Scaling the axes
Drawing the bars themselves
Adding the quantity labels to the top of each bar
This took me a while to figure out because I was originally appending to the rect element. According to the SVG specs this is illegal, so I moved on to appending them after everything else to they’d show on top. The positioning is tricky, too. I eventually found the correct variables to position it close to center. Then text-anchor: middle; sealed the deal.
Responsiveness
The general method for making D3 charts responsive is to scale the SVG down proportionally as the window gets smaller by manipulating the viewBox and preserveAspectRatio attributes. But after digging around on Github for a while, I found a fancier solution that preserves the height and redraws the SVG as the width shrinks.
Wrapping text labels
Wrapping text labels is tricky. The best solution I found is the one Mike Bostock (D3’s creator) describes. I modified it slightly to work with my chart, but the overall solution is the same.
I donât really follow basketball. But since Iâve been hearing a lot about this Steph Curry guy in the news, on Facebook, and on podcasts, I decided to look into his stats. And since Iâm trying to teach myself about data science and visualization, I thought Iâd visualize some of his stats to see what I could learn.
From what I could tell Steph Curry had a decent start to his career, but while his was above-average, he didnât seem to be a star. Then after he got injured during the 2011-12 season, he must have had a revelation because he came back the next season and made a name for himself.
The next season he started taking radically more three point shots and it paid off for him. His overall points went up as he made more three point attempts:
Here are the number of three point attempts vs three point shots made by season with the circles scaled by the number of overall points he made that season:
Even though he took more shots, his overall shooting percentage and percentage on three-pointers has increased recently, but not dramatically. It has fluctuated over his career:
Many current players have him beat on overall shooting percentage and a few guys are rivaling him on three point percentage (Jason Kapono, Steve Novak, Kyle Korver.)
So if you canât stand out by being more accurate than everyone else, what can you do? In a game that is driven by the final overall score instead of percentages, being about as accurate as everyone else but willing to shoot more pays off. Curry is willing to take more long shots than anyone else in the league, by far.
Here are Curryâs 3 point attempts vs 3 point shots made compared with the other top shooters in the league for the past four years:
2012201320142015
Here is the same data visualized as a bump chart. You can see the other top shooters jostling positions, but Curry has been king for the past four years:
I realize that there is more to the game than just taking more shots. You need incredible talent and skill to keep up a percentage like Curry’s from anywhere on the court. Given his success, I’m willing to bet that we’ll see more players trying to emulate his approach next season.
Side note: James Harden seems to be using the same strategy, except with two point shots and free throws. It is paying off for him, too. He is currently the number two leader in the NBA, right behind Steph Curry.
For the past 29 years, the American Farm Bureau Federation has conducted an informal survey of the price of a classic Thanksgiving dinner for 10 people. At first glance, it looks like the price of food has been steadily rising. But when you adjust the numbers for inflation, you get a different story. It isn’t the cost of our food that has been rising, but the amount of US currency in circulation.
This year, we’re thankful for technologies like bitcoin breaking the Federal Reserve’s grip on money.
Happy Thanksgiving!
The individual prices of a traditional Thanksgiving dinner this year: