Some WordPress Core Contributor stats

The Inspiration

Earlier this week David Bisset asked:

This got me curious. Is this data out there? How might one get it?

I started looking at core release posts and saw that contributors are linked, which gave me the idea to scrape it and see what I could come up with.

Caveats about this data

  1. Since this data is from the thanked contributors in core release posts, it includes more than just code contributions. It also includes documentation, testing, design, marketing, etc.
  2. I only included data from 5.0-6.0 named releases.
    • 5.0 was released in December 2018, almost 4 years ago. 4 years seemed far enough to go back.
    • I only included named core releases, as those are the larger ones that more people contribute to. The maintenance and security releases have a much smaller set of contributors.
  3. The data gets less accurate the further I go back in terms of release dates because I can only scrape their current profile, not their previous profiles. Some most likely switched employers.
  4. The data is only as accurate as the profiles on WordPress.org. Not all profiles have employers listed. There are some folks I know work for big companies in the WordPress ecosystem and contribute to core who do not have an employer listed. I did not add any that were missing, I went by what is available.
  5. I had to do a lot of manual clean up to make the data consistent, which is typical when you scrape data from the web. If I made a mistake or missed something, that mistake is mine alone.
  6. In full transparency, I work at Automattic. This exploration was not done as part of my work there, but as a curious member of the WordPress.org community. In the WordPress project, I am a part of the Photos team.
  7. There are many other ways to contribute to the WordPress ecosystem and project that are not captured in this data. I only pulled data on contributors to named core releases.
  8. It is possible I made some scraping, formula, or calculation mistakes. If you find something wrong, please let me know.

Contributors to named core releases, grouped by company, for versions 5.0-6.0

Note: If someone has an employer listed on their profile, that does not necessarily mean they are sponsored by that company. If you want to know about sponsored contributors, go to the Sponsored section.

Core release6.05.95.85.75.65.55.45.35.25.15.0
Total contributors551658560502679866592707385550477
Company 1Automattic (77, 14%)Automattic (94, 14.3%)Automattic (88, 15.7%)Automattic (66, 13.1%)Automattic (79, 11.6%)Automattic (87, 10%)Automattic (60, 10.1%)Automattic (61, 8.6%)Automattic (42, 10.9%)Automattic (55, 10%)Automattic (62, 13%)
Company 210up (15, 2.7%)Yoast (14, 2.1%)Yoast (12, 2.1%)10up (12, 2.4%)Yoast (14, 2.1%)10up (16, 1.8%)10up (11, 1.9%)Yoast (16, 2.3%)10up (11, 2.9%)Yoast (20, 3.6%)10up (14, 2.9%)
Company 3Yoast (10, 1.8%)10up (11, 1.7%)10up (11, 2%)Yoast (11, 2.2%)10up (13, 1.9%)Whodunit (11, 1.3%)Yoast, Whodunit, Human Made (7, 1.2%)10up (14, 2%)Human Made (6, 1.6%)10up (16, 2.9%)Human Made (11, 2.3%)
Company 4Multidots (9, 1.6%)Multidots (10, 1.5%)Human Made (5, 0.9%)XWP, Google (6, 1.2%)Awesome Motive (8, 1.2%)Yoast, rtCamp (9, 1%)XWP (6, 1%)Human Made (9, 1.3%)Yoast (5, 1.3%)Human Made (9, 1.6%)Yoast (10, 2.1%)
Company 5rtCamp (6, 1.1%)XWP (6, 0.9%)XWP, rtCamp Google, Bluehost, Awesome Motive, Alley (4, 0.7%)Awesome Motive (5, 1%)XWP (6, 0.9%)XWP, WP Engine, Human Made (8, 0.9%)Multidots, Google, Bluehost (4, 0.7%)Multidots (7, 1%)Google (4, 1%)rtCamp (7, 1.3%)Bluehost (6, 1.3%)
No company listed245 (44.5%)306 (46.5%)259 (46.3%)238 (47.4%)332 (48.9%)434 (50.1%)307 (51.9%)348 (49.2%)182 (47.3%)247 (44.9%)230 (48.2%)
Company name (Count of employed contributors, percentage of the total number of contributors)

Individuals who contributed to all 11 of the most recent named core releases

49 people have contributed to all 11 releases (5.0-6.0) I pulled data for. I think these people deserve special recognition:

The companies these awesome individuals work for:

  • Automattic (13)
  • Google (4)
  • 10up (3)
  • Alley (2)
  • XWP (2)
  • Yoast (2)
  • Accessible Web Design (1)
  • Advies en zo (1)
  • Awesome Motive (1)
  • Bluehost (1)
  • Dekode Interaktiv AS (1)
  • FlipMetrics (1)
  • GoDaddy (1)
  • Happy Prime (1)
  • Human Made (1)
  • Parship Group (1)
  • Penske Media Corporation (1)
  • SendtoNews Incorporated (1)
  • Shopify (1)
  • Whodunit (1)
  • iThemes (1)
  • required (1)

7 of these individuals has no employer listed in their wordpress.org profile.

27 of these individuals have the Sponsored tag on their profile.

These are the number of contributors per release that have the Sponsored tag in their profile. This is a count of sponsored contributors, not necessarily a good breakdown of the amount contributed by each.

Core release6.05.95.85.75.65.55.45.35.25.15.0
Total contributors551658560502679866592707385550477
Sponsored110125103771061046772546362
Sponsored %19.9%18.9%18.4%15.3%15.6%12%11.3%10.2%14%11.5%13%

These are the sponsored contributors grouped by company. Includes count of sponsored contributors and the percentage of the total number of sponsored contributors for that release.

Core release6.05.95.85.75.65.55.45.35.25.15.0
Company 1Automattic (61, 55.5%)Automattic (65, 52%)Automattic (55, 53.4%)Automattic (32, 41.6%)Automattic (43, 40.6%)Automattic (36, 34.6%)Automattic (24, 35.8%)Automattic (23, 31.9%)Automattic (16, 29.6%)Automattic (16, 25.4%)Automattic (21, 33.9%)
Company 2XWP (7, 6.4%)Yoast (10, 8%)Yoast (7, 6.8%)Yoast (8, 10.4%)Yoast (11, 10.4%)Whodunit (7, 6.7%)Yoast (6, 9%)Yoast (10, 13.9%)Yoast (5, 9.3%)Yoast (11, 17.5%)Yoast, XWP (6, 9.7%)
Company 3Yoast (6, 5.5%)Multidots (6, 4.8%)XWP (5, 4.9%)XWP (5, 6.5%)XWP, 10up (5, 4.7%)Yoast, XWP, WP Engine (6, 5.8%)Whodunit (5, 7.5%)Whodunit, Google (4, 5.6%)Google (4, 7.4%)XWP, Human Made, Google (4, 6.3%)Google (4, 6.5%)
Company 4Google, GoDaddy, Extendify (4, 3.6%)XWP (5, 4%)Google, 10up (4, 3.9%)Google, 10up (4, 5.2%)WP Engine, Google, Awesome Motive (4, 3.8%)Human Made, Google, Awesome Motive (4, 3.8%)Google (4, 6%)XWP, Human Made (3, 4.2%)XWP, Human Made, Bluehost, 10up (3, 5.6%)10up (3, 4.8%)Human Made, Bluehost, 10up (3, 4.8%)
Company 5Multidots, Human Made, Awesome Motive (3, 2.7%)Google, Bluehost (4, 3.2%)GoDaddy, Awesome Motive (3, 2.9%)WP Engine, Whodunit, Required, Human Made, GoDaddy, Bluehost, Awesome Motive (2, 2.6%)Human Made, Extendify, Bluehost (3, 2.8%)GoDaddy, Bluehost, 10up (3, 2.9%)XWP, Human Made, 10up (3, 4.5%)WP Engine, rtCamp, Required, Bluehost, Awesome Motive, 10up (2, 2.8%)WP Engine, WebDevStudios, Awesome Motive (2, 3.7%)WPMUDEV, Whodunit, WebDevStudios, Required, Bluehost, Awesome Motive (2, 3.2%)WebDevStudios Required (2, 3.2%)
Company name (Count of sponsored contributors, percentage of the total number of sponsored contributors)

How I gathered and analyzed this data

  1. For each named core release (I.e. 6.0 “Arturo”, 5.9 “Josephine”, etc) I used the free version of Data Miner to pull the list of thanked contributors in the release post.
    • You could do this with a script too, but I already had Data Miner installed and knew how to use it, so it was the fastest way to get what I needed.
    • The element I targeted: p.is-style-wporg-props-long a
    • I saved the href attribute for each result in a text file.
  2. I looped through each text file of contributor URLs with a bash script and pulled in two fields from their wordpress.org profiles: Employer and Contributions.
    • I used curl, tr, awk, and pup to transform the data into something useable.
// Assumes an input file named 5-1.txt with a list of profile URLs
// requires pup https://github.com/ericchiang/pup
for url in $(head -n800 5-1.txt); do
    employer="$(curl -s $url | pup -p 'li#user-company text{}' | awk '{sub(/Employer:/,"")} 1' | tr -d '\n' | tr -d '\t')"
    contributions="$(curl -s $url | pup -p 'div.item-meta-contribution text{}' | tr -d '\n' | tr -d '\t')"
    echo "$url | $employer | $contributions" >> 5-1_contributors.txt
done
  1. I first started exploring the data in Google Sheets and made pivot tables for each named release.
    • This took a lot of data clean up to make the data more consistent. Since the Employer field is open text, there were lots of different versions of the same company (Company, Company Inc, Company PVT LTD, etc). I cleaned it up the best I could in the time I wanted to spend on it, but there are still probably some duplicates.
    • This gave me the table of stats for the companies represented in each named release.
    • I used regex to find which company sponsors a contributor based on their Contributions section on their profile and made a pivot table of this information.
  2. I used Datasette to explore a CSV of all contributors and which version they contributed to. This gave me the list of 49 people who contributed to all 11 versions I checked and which companies they work for.

Data sources

Want to take a look at this data?

More areas for exploration

  1. Code contributions from SVN?
    • Number of lines changed by contributor and also grouped by employer
  2. Finding more accurate data?
    • If there were snapshots of this data from each release, it would be nice to use those instead. I could only pull data from current profiles, and users may have switched employers. For example, up until recently mkaz worked at Automattic, but since he no longer does, his previous contributions are not grouped under Automattic.
    • Not all profiles have employers listed. There are some folks I know work for big companies in the WordPress ecosystem and contribute to core who do not have an employer listed.
  3. Graphing different facets of this data to see how it changes over time.

41 responses to “Some WordPress Core Contributor stats”

  1. Good stuff!

    FWIW, I think the list on release posts is the same as that returned by the credits API. It may be a more consumable/reliable source to refresh over time.

    If you haven’t seen it yet, Jean-Baptiste Audras has been compiling various stats for releases. That data could help track employer changes, at least since 5.4.

    Actually seeing a list of code changes per employer would probably require re-scanning commit messages in core and pull requests in Gutenberg to match names and rebuild the log as employer focused.

    And along the same lines, but way wackier: use git/svn blame data to see what percentage of core was last touched by employer. 🙃

Reposts

  • WordPressVee.com
  • WordPress Bot
  • Post Status
  • Md Nahid Islam 🇧🇩🇶🇦
  • Do_the Woo 🎙️ WooCommerce Builder Community
  • :Cromwell: also @mathetos@fosstodon.org
  • Rob Howard
  • David Bisset
  • WordPressVee.com
  • WordPress Bot

Mentions

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: