On Broken Link Checkers

I’ve been working with and thinking about broken link checkers a lot lately. Here are some thoughts on what features link checkers should have, open questions, and what to do with broken links once you find them.

Features that should be standard, but aren’t

  • Link checkers should ignore share links.
  • Link checkers should ignore robots.txt on the sites they check.
    • Twitter blocks most broken link checkers. Every Twitter URL on every site I’ve checked comes back as broken, and upon further investigation they are almost always working.
  • Link checkers should follow redirects.
    • Redirects are valid, functioning links!
  • Link checkers should have robust support for services like YouTube and Vimeo.
    • Videos that appear missing on those services still return 200 status codes when checked, along with a message in the HTML that the video is not found. Most broken link checkers show those as false negatives.

Open questions

  • Should we ignore comments? Most commenting systems allow links and also tend to link back to the comment author’s site. Those tend to be a significant source of broken links. Should those be left alone because they aren’t the site’s content, but rather user-generated content?

What to do with the broken links once you find them?

  • I’m against changing historical content in a damaging way, so I do not support changing links.
  • I do support appending additional helpful information, such as a link to a working version of the broken link on the Wayback Machine in a format like this: Old link text with broken link (archive.org link)

3 responses to “On Broken Link Checkers”

  1. […] On Broken Link Checkers […]

  2. anonymous Avatar

    One cool thing that happened this week is one of my photos was selected to be in a special gallery at WordCamp US! I didn’t…

  3. […] I have a much better understanding of WordPress Multisite and I’ve been thinking a lot about link checkers. Hopefully I can point to some public announcements […]

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: