Chuck Grimmett

On Broken Link Checkers

Aug 24, 2023 2:36 PM

I’ve been working with and thinking about broken link checkers a lot lately. Here are some thoughts on what features link checkers should have, open questions, and what to do with broken links once you find them.

Features that should be standard, but aren’t

Link checkers should ignore share links.
- Share links are pervasive on blogs and news sites. They show as links on a page, but I don’t consider them “real” links. They work to submit the page to the service and nothing else. Identify them and skip them.
- Here is an ignorelist to get you started: https://gist.github.com/cagrimmett/00200b47a9f5948d7906be154e7abd78
Link checkers should ignore robots.txt on the sites they check.
- Twitter blocks most broken link checkers. Every Twitter URL on every site I’ve checked comes back as broken, and upon further investigation they are almost always working.
Link checkers should follow redirects.
- Redirects are valid, functioning links!
Link checkers should have robust support for services like YouTube and Vimeo.
- Videos that appear missing on those services still return 200 status codes when checked, along with a message in the HTML that the video is not found. Most broken link checkers show those as false negatives.

Open questions

Should we ignore comments? Most commenting systems allow links and also tend to link back to the comment author’s site. Those tend to be a significant source of broken links. Should those be left alone because they aren’t the site’s content, but rather user-generated content?

What to do with the broken links once you find them?

I’m against changing historical content in a damaging way, so I do not support changing links.
I do support appending additional helpful information, such as a link to a working version of the broken link on the Wayback Machine in a format like this: Old link text with broken link (archive.org link)
- The Wayback Availability JSON API is a quick way to find if said links exist on the Wayback Machine.
- Ideally this would be built in to to the broken link checker, but if you only have the broken links in a CSV, here is a handy PHP script to check the Wayback Availability JSON API.

Category: Tech

Chuck Grimmett

Posted by Chuck Grimmett. Learn more on his About page.

Comments and Webmentions

3 responses to “On Broken Link Checkers”

Dave's linkblog

2023-08-24

[…] On Broken Link Checkers […]

Loading…

Reply
anonymous

2023-08-27

One cool thing that happened this week is one of my photos was selected to be in a special gallery at WordCamp US! I didn’t…

Loading…

Reply
Week of August 21, 2023 – Chuck Grimmett

2023-08-27

[…] I have a much better understanding of WordPress Multisite and I’ve been thinking a lot about link checkers. Hopefully I can point to some public announcements […]

Loading…

Reply

Chuck Grimmett

On Broken Link Checkers

Comments and Webmentions

3 responses to “On Broken Link Checkers”

Leave a ReplyCancel reply

Webmentions