How to properly fix broken links?

i’m writing a bash script that uses curl to check HTTP status codes for all links in a given file

should i be looking in the posts database SQL file, or the XML export files for posts and pages?

finally, what is the best/safest way to replace busted links - should i replace them in the XML files and re-import them (plugin required), or is it better to replace them in the db?

my worry is, if i search/replace globally in the posts db that i could be replacing some links that might be broken for a good reason, such as CDATA links (i don’t know what they’re for) or perhaps internal redirects for posts/pages that i moved or whatever

That’s a broad question that doesn’t have one definitive answer.
I suggest you use an existing plugin that has already thought through it.

There is logic in the main importer plugin to change links from the XML base to the current base URL. Additionally, renamed posts are handled internally, using the _wp_old_slug post meta field.

1 Like

the plugins i’ve tried are… “buggy”, to put it gently, hence why i’m writing a script to handle this myself

Broken Link Checker is coded badly and will not run on a reasonably secured web server - i opened a bug report well over a year ago that still hasn’t been addressed far as i know - frankly i’ve had it with their “support”

another one i just tried is Link Finder - this poorly coded mess spams requests as fast as possible thus resulting in an avalanche of ‘429 TOO MANY REQUESTS’ and the 404’s it did return that i checked were 200’s, not 404’s

so, at this point, and unless someone has a better idea, i intend to parse the XML post/page exports and then manually update the broken links in the *_posts table > post_content column only, thus i won’t be messing with any serialized data i don’t think

this seems to me like it might be the better way to go rather than parsing the SQL dump file of *_posts, however i’m not the sharpest knife in the drawer and so am open to suggestions :slight_smile:

I’m locking this thread, so it doesn’t attract SEO spam.