The Inspiration
Earlier this week David Bisset asked:
This got me curious. Is this data out there? How might one get it?
I started looking at core release posts and saw that contributors are linked, which gave me the idea to scrape it and see what I could come up with.
Caveats about this data
- Since this data is from the thanked contributors in core release posts, it includes more than just code contributions. It also includes documentation, testing, design, marketing, etc.
- I only included data from 5.0-6.0 named releases.
- 5.0 was released in December 2018, almost 4 years ago. 4 years seemed far enough to go back.
- I only included named core releases, as those are the larger ones that more people contribute to. The maintenance and security releases have a much smaller set of contributors.
- The data gets less accurate the further I go back in terms of release dates because I can only scrape their current profile, not their previous profiles. Some most likely switched employers.
- The data is only as accurate as the profiles on WordPress.org. Not all profiles have employers listed. There are some folks I know work for big companies in the WordPress ecosystem and contribute to core who do not have an employer listed. I did not add any that were missing, I went by what is available.
- I had to do a lot of manual clean up to make the data consistent, which is typical when you scrape data from the web. If I made a mistake or missed something, that mistake is mine alone.
- In full transparency, I work at Automattic. This exploration was not done as part of my work there, but as a curious member of the WordPress.org community. In the WordPress project, I am a part of the Photos team.
- I tried to be as transparent as possible by making my data publicly available.
- There are many other ways to contribute to the WordPress ecosystem and project that are not captured in this data. I only pulled data on contributors to named core releases.
- It is possible I made some scraping, formula, or calculation mistakes. If you find something wrong, please let me know.
Contributors to named core releases, grouped by company, for versions 5.0-6.0
Note: If someone has an employer listed on their profile, that does not necessarily mean they are sponsored by that company. If you want to know about sponsored contributors, go to the Sponsored section.
Core release | 6.0 | 5.9 | 5.8 | 5.7 | 5.6 | 5.5 | 5.4 | 5.3 | 5.2 | 5.1 | 5.0 |
---|---|---|---|---|---|---|---|---|---|---|---|
Total contributors | 551 | 658 | 560 | 502 | 679 | 866 | 592 | 707 | 385 | 550 | 477 |
Company 1 | Automattic (77, 14%) | Automattic (94, 14.3%) | Automattic (88, 15.7%) | Automattic (66, 13.1%) | Automattic (79, 11.6%) | Automattic (87, 10%) | Automattic (60, 10.1%) | Automattic (61, 8.6%) | Automattic (42, 10.9%) | Automattic (55, 10%) | Automattic (62, 13%) |
Company 2 | 10up (15, 2.7%) | Yoast (14, 2.1%) | Yoast (12, 2.1%) | 10up (12, 2.4%) | Yoast (14, 2.1%) | 10up (16, 1.8%) | 10up (11, 1.9%) | Yoast (16, 2.3%) | 10up (11, 2.9%) | Yoast (20, 3.6%) | 10up (14, 2.9%) |
Company 3 | Yoast (10, 1.8%) | 10up (11, 1.7%) | 10up (11, 2%) | Yoast (11, 2.2%) | 10up (13, 1.9%) | Whodunit (11, 1.3%) | Yoast, Whodunit, Human Made (7, 1.2%) | 10up (14, 2%) | Human Made (6, 1.6%) | 10up (16, 2.9%) | Human Made (11, 2.3%) |
Company 4 | Multidots (9, 1.6%) | Multidots (10, 1.5%) | Human Made (5, 0.9%) | XWP, Google (6, 1.2%) | Awesome Motive (8, 1.2%) | Yoast, rtCamp (9, 1%) | XWP (6, 1%) | Human Made (9, 1.3%) | Yoast (5, 1.3%) | Human Made (9, 1.6%) | Yoast (10, 2.1%) |
Company 5 | rtCamp (6, 1.1%) | XWP (6, 0.9%) | XWP, rtCamp Google, Bluehost, Awesome Motive, Alley (4, 0.7%) | Awesome Motive (5, 1%) | XWP (6, 0.9%) | XWP, WP Engine, Human Made (8, 0.9%) | Multidots, Google, Bluehost (4, 0.7%) | Multidots (7, 1%) | Google (4, 1%) | rtCamp (7, 1.3%) | Bluehost (6, 1.3%) |
No company listed | 245 (44.5%) | 306 (46.5%) | 259 (46.3%) | 238 (47.4%) | 332 (48.9%) | 434 (50.1%) | 307 (51.9%) | 348 (49.2%) | 182 (47.3%) | 247 (44.9%) | 230 (48.2%) |
Individuals who contributed to all 11 of the most recent named core releases
49 people have contributed to all 11 releases (5.0-6.0) I pulled data for. I think these people deserve special recognition:
- adamsilverstein
- afercia
- audrasjb
- azaozz
- birgire
- boonebgorges
- clorith
- davidbinda
- dd32
- desrosj
- dlh
- drewapicture
- fierevere
- flixos90
- garrett-eclipse
- gziolo
- iandunn
- jeffpaul
- jeremyfelt
- joedolson
- joemcgill
- joen
- johnbillion
- johnjamesjacoby
- jorbin
- jorgefilipecosta
- joyously
- jrf
- kjellr
- kraftbj
- mikeschroder
- mkaz
- mukesh27
- noisysocks
- obenland
- ocean90
- pbearne
- peterwilsoncc
- ryelle
- sergeybiryukov
- soean
- spacedmonkey
- swissspidy
- talldanwp
- timothyblynjacobs
- tobifjellner
- westonruter
- youknowriad
- zebulan
The companies these awesome individuals work for:
- Automattic (13)
- Google (4)
- 10up (3)
- Alley (2)
- XWP (2)
- Yoast (2)
- Accessible Web Design (1)
- Advies en zo (1)
- Awesome Motive (1)
- Bluehost (1)
- Dekode Interaktiv AS (1)
- FlipMetrics (1)
- GoDaddy (1)
- Happy Prime (1)
- Human Made (1)
- Parship Group (1)
- Penske Media Corporation (1)
- SendtoNews Incorporated (1)
- Shopify (1)
- Whodunit (1)
- iThemes (1)
- required (1)
7 of these individuals has no employer listed in their wordpress.org profile.
27 of these individuals have the Sponsored tag on their profile.
Sponsored contributors
These are the number of contributors per release that have the Sponsored tag in their profile. This is a count of sponsored contributors, not necessarily a good breakdown of the amount contributed by each.
Core release | 6.0 | 5.9 | 5.8 | 5.7 | 5.6 | 5.5 | 5.4 | 5.3 | 5.2 | 5.1 | 5.0 |
---|---|---|---|---|---|---|---|---|---|---|---|
Total contributors | 551 | 658 | 560 | 502 | 679 | 866 | 592 | 707 | 385 | 550 | 477 |
Sponsored | 110 | 125 | 103 | 77 | 106 | 104 | 67 | 72 | 54 | 63 | 62 |
Sponsored % | 19.9% | 18.9% | 18.4% | 15.3% | 15.6% | 12% | 11.3% | 10.2% | 14% | 11.5% | 13% |
These are the sponsored contributors grouped by company. Includes count of sponsored contributors and the percentage of the total number of sponsored contributors for that release.
Core release | 6.0 | 5.9 | 5.8 | 5.7 | 5.6 | 5.5 | 5.4 | 5.3 | 5.2 | 5.1 | 5.0 |
---|---|---|---|---|---|---|---|---|---|---|---|
Company 1 | Automattic (61, 55.5%) | Automattic (65, 52%) | Automattic (55, 53.4%) | Automattic (32, 41.6%) | Automattic (43, 40.6%) | Automattic (36, 34.6%) | Automattic (24, 35.8%) | Automattic (23, 31.9%) | Automattic (16, 29.6%) | Automattic (16, 25.4%) | Automattic (21, 33.9%) |
Company 2 | XWP (7, 6.4%) | Yoast (10, 8%) | Yoast (7, 6.8%) | Yoast (8, 10.4%) | Yoast (11, 10.4%) | Whodunit (7, 6.7%) | Yoast (6, 9%) | Yoast (10, 13.9%) | Yoast (5, 9.3%) | Yoast (11, 17.5%) | Yoast, XWP (6, 9.7%) |
Company 3 | Yoast (6, 5.5%) | Multidots (6, 4.8%) | XWP (5, 4.9%) | XWP (5, 6.5%) | XWP, 10up (5, 4.7%) | Yoast, XWP, WP Engine (6, 5.8%) | Whodunit (5, 7.5%) | Whodunit, Google (4, 5.6%) | Google (4, 7.4%) | XWP, Human Made, Google (4, 6.3%) | Google (4, 6.5%) |
Company 4 | Google, GoDaddy, Extendify (4, 3.6%) | XWP (5, 4%) | Google, 10up (4, 3.9%) | Google, 10up (4, 5.2%) | WP Engine, Google, Awesome Motive (4, 3.8%) | Human Made, Google, Awesome Motive (4, 3.8%) | Google (4, 6%) | XWP, Human Made (3, 4.2%) | XWP, Human Made, Bluehost, 10up (3, 5.6%) | 10up (3, 4.8%) | Human Made, Bluehost, 10up (3, 4.8%) |
Company 5 | Multidots, Human Made, Awesome Motive (3, 2.7%) | Google, Bluehost (4, 3.2%) | GoDaddy, Awesome Motive (3, 2.9%) | WP Engine, Whodunit, Required, Human Made, GoDaddy, Bluehost, Awesome Motive (2, 2.6%) | Human Made, Extendify, Bluehost (3, 2.8%) | GoDaddy, Bluehost, 10up (3, 2.9%) | XWP, Human Made, 10up (3, 4.5%) | WP Engine, rtCamp, Required, Bluehost, Awesome Motive, 10up (2, 2.8%) | WP Engine, WebDevStudios, Awesome Motive (2, 3.7%) | WPMUDEV, Whodunit, WebDevStudios, Required, Bluehost, Awesome Motive (2, 3.2%) | WebDevStudios Required (2, 3.2%) |
How I gathered and analyzed this data
- For each named core release (I.e. 6.0 “Arturo”, 5.9 “Josephine”, etc) I used the free version of Data Miner to pull the list of thanked contributors in the release post.
- You could do this with a script too, but I already had Data Miner installed and knew how to use it, so it was the fastest way to get what I needed.
- The element I targeted:
p.is-style-wporg-props-long a
- I saved the
href
attribute for each result in a text file.
- I looped through each text file of contributor URLs with a bash script and pulled in two fields from their wordpress.org profiles: Employer and Contributions.
// Assumes an input file named 5-1.txt with a list of profile URLs
// requires pup https://github.com/ericchiang/pup
for url in $(head -n800 5-1.txt); do
employer="$(curl -s $url | pup -p 'li#user-company text{}' | awk '{sub(/Employer:/,"")} 1' | tr -d '\n' | tr -d '\t')"
contributions="$(curl -s $url | pup -p 'div.item-meta-contribution text{}' | tr -d '\n' | tr -d '\t')"
echo "$url | $employer | $contributions" >> 5-1_contributors.txt
done
- I first started exploring the data in Google Sheets and made pivot tables for each named release.
- This took a lot of data clean up to make the data more consistent. Since the Employer field is open text, there were lots of different versions of the same company (Company, Company Inc, Company PVT LTD, etc). I cleaned it up the best I could in the time I wanted to spend on it, but there are still probably some duplicates.
- This gave me the table of stats for the companies represented in each named release.
- I used regex to find which company sponsors a contributor based on their Contributions section on their profile and made a pivot table of this information.
- I used Datasette to explore a CSV of all contributors and which version they contributed to. This gave me the list of 49 people who contributed to all 11 versions I checked and which companies they work for.
Data sources
Want to take a look at this data?
- Google Sheet with the information I scraped and pivot tables: https://docs.google.com/spreadsheets/d/1JwK9vbUnli8JkJJTxwULXk9boi6abAjhFZ7XayXHEOw/edit?usp=sharing
- GitHub gist with a CSV of contributors and the release they contributed to, my scraping script, and a sample of the profile URLs for one release that the script used: https://gist.github.com/cagrimmett/a69add30351d9d8124de20da8b2b900c
More areas for exploration
- Code contributions from SVN?
- Number of lines changed by contributor and also grouped by employer
- Finding more accurate data?
- If there were snapshots of this data from each release, it would be nice to use those instead. I could only pull data from current profiles, and users may have switched employers. For example, up until recently mkaz worked at Automattic, but since he no longer does, his previous contributions are not grouped under Automattic.
- Not all profiles have employers listed. There are some folks I know work for big companies in the WordPress ecosystem and contribute to core who do not have an employer listed.
- Graphing different facets of this data to see how it changes over time.
Leave a Reply