WordPress site migration: Stage 1 Redirects

by Rob Hughes on March 3, 2011 · 0 comments

Rob Hughes is doing my fucking head in while having to format this pageRecently I was conducting a piece of work on one of my WordPress sites that involved setting up lots of redirects. Rather than installing ANOTHER plugin I decided I wanted to do it properly, using .htaccess, rather than slow my site down some more!

Unfortunately, although this initially seemed like it would be a rather quick thing to do I realised that WordPress already had lots of redirects built into it from where I had renamed pages after publishing them, and because of the permalinks that were implemented post-launch.

There are numerous issues implementing further redirects could cause, here are just a couple:

  • Redirect loops (search engines, and users, will get stuck in a never ending redirect loop and therefore the page will never be accessible)
  • Replacing a correct redirect with an incorrect one, for example:
    Original URL http://www.domain.com/index.php?cat=3&pid=10

    WordPress redirects too http://www.domain.com/category-3/product-10

    However, as a catch-all on pages that have been removed I may place a redirect in my .htaccess to catch anything with cat=3 and redirect it too http://www.domain.com/category-3/ and as such losing the value that could be gained from this redirect.

I’m not going to go in any further details than the above, I’m sure if you’re reading this post you’re more than aware of the various issues that can be caused with redirects; so, I’ll get straight into how I got around this issue…

The first thing I wanted to look at was to get all the pages that are indexed, have links to them, or get traffic of any sort. To do this I used a combination of:
- Using Google Spreadsheets to look at indexation (here is an example I pulled together)
- Using an advanced report on Majestic SEO to have a look at the pages that have links to them. This isn’t the only source you could use whatever you want to get this information, for example:

- Reviewing analytics and/or server logs to attain any pages that have had traffic since the changes

Once I had a list of all the pages of the site that I would want to redirect for any reason, I crawled these URLs (you can use whatever tool you want, I used one of my own scripts) to determine the http response codes for all the pages (and the destination page if it is being redirected).

Here is the “clever” bit that you’ve all been waiting for, getting the list of all the redirects that are in the WordPress database already. To do this I spent a while researching and reading lots of sites, unfortunately I couldn’t really find a decent way to do this without installing plugins (which, if you remember back to the start of the post, is why I started this whole mini-project). Because of this I decided to write my own SQL query which enabled me to pull back all the information I needed:

SELECT
wp_posts.post_type,
wp_posts.post_date,
wp_posts.post_name AS current_slug,
IFNULL(
(
SELECT
GROUP_CONCAT(wp_postmeta.meta_value SEPARATOR ” ==> “)
FROM
wp_postmeta
WHERE
wp_postmeta.post_id = wp_posts.ID
AND
wp_postmeta.meta_key = “_wp_old_slug”
ORDER BY
NULL)
,”") AS old_slugs,
wp_posts.guid AS url_without_permalinks
FROM
wp_posts
WHERE
wp_posts.post_type != “revision”
ORDER BY
wp_posts.post_date ASC

This query is actually relatively simple when you know where you’re supposed to be looking for the information so sorry if it’s not ground-breaking for all you developers out there :)
For all of those out there that aren’t developers, here is what the query does in English:

Select these base columns from the wp_posts table (where all the normal WordPress things are saved):

  • Type (page, post, attachment, revision)
  • Date of creation
  • The current search engine friendly page relative URL
  • What the pages URL would be if permalinks were turned off

Then look at the wp_postmeta table (where all the background information about the posts, pages, etc. are stored and:

  • Select all the old page relative URLs and concatenate them into one long string (separated by =>). If they don’t exist return a blank cell.

This is all filtered to remove anything where the post-type is “revision” meaning where anything has been altered or is in draft
(Hopefully that makes sense – harder for me to explain what it does in English which is quite depressing!)

So now let’s review… where we are:

  • We have a list of all the pages we want to keep
  • We have extended information about all the pages and what response codes they are returning
  • We have a list of all the current redirects in WordPress

With all this information you can then use whatever method you want to work out what redirects are in WordPress already that you want to leave (or if you’re like me, what you want to take away from WordPress altogether and handle in your .htaccess file). Then you can go ahead and make your .htaccess file with all your redirects based query strings, etc.
Just to give you one example, I’ll use the case I used before when explaining the possible issues:

Options +FollowSymLinks
RewriteEngine On

RewriteCond %{QUERY_STRING} ^cat=3
RewriteCond %{QUERY_STRING} !^pid=10
RewriteRule ^(.*)$ http://www.domain.com/category-3/ [R=301]

I’m by no means, in any of this saying this is the best way to go about this, there are plenty of quicker ways to do all this analysis I’m sure. This is the method I used though and hopefully this blog post gives lots of tips that can be used standalone as well as part of the process I followed. In the next post I will be covering how you can use the data in terms of actually migrating the site.

Google+ Comments

Leave a Comment

Previous post:

Next post: