24
votes

First of all, I want to make it clear that I know what best practices are when it comes to handling 404 errors. However, I have this specific case where I may need a tailored approach.

I'm handling a newspaper site that has more than 10 years' worth of archives, with 150k+ hard-worked pieces of content and loads of links that still get clicked through. It also went through a lot of trouble: 3 different CMS's before WP, each with it's own link structure and unproper redirecting upon every change. So now the archives are all but "lost" from a SEO point of view.

With more than 90% content misplaced, showing classical 404's is not really an option. The emergency exit was to redirect the words in the URL to a search query (after filtering out the constants) and hope for the best. In most cases, the relevant result shows up towards the top, but not always. For this reason, I suppose it's wrong to pretend that the 404 is simply not there.

The other approach I thought of was this: keep the URL verbatim, send the 404 status, but use the 404 template to show a search query (WP_query with 's' parameter) on the relevant words.

This has the advantage that, on strong matches (those that are all but certain to be the "I'm feeling lucky"), I can decide to force an actual 301 redirect. That's not always the case, though: sometimes the actual wanted article is very far down the list. Still, it would work almost fine, except that for some reason pagination is not working on 404s. So now I think one of two things need to be done:

  1. The simple solution, if only possible: somehow make pagination work on the 404 template - since I have no idea why it doesn't already, I don't know if it can be done or how. (Update: most likely it is because the pagination query var/slug is treated as part of the search)

  2. The complicated solution, if only feasible: Use the search template itself. The 'search' slug can be removed completely by hooking into rewrite rules with a $wp_rewrite->search_base = ''; This theoretically turns almost any url thrown at it into a search. The huge problem is that it also does it for postnames and everything else except categories and tags. So what I get from this is the following: Whenever there's a URL request, Wordpress will look if there is a category matching, then a tag, then it will do a search. Only after that will it look for matching authors, archives, posts etc. If only I could somehow hook into wordpress's internal rules regarding url parsing priority and move the search thing to the end of the list, the problem would be solved.

I'll have to admit that I didn't try any actual code. I don't know where to start from, I don't know exactly what to search for and there also seems to be little documentation for what I want. All I was able to do so far was blind test, as described above.

So the question is if there's any way to do either of the above and how.

2
That "404 template" is something WP-specific? Anyway, why not do the "pagination" via ajax? When someone scrolls to the bottom, ajax will load next X results into the active page without any reload or URL change...Marki555
The template is not specific, the way it works I guess is. Paginaation actually is done via ajax, but the target pages still need to be created in the first place, which doesn't happen.lucian

2 Answers

11
votes

The simple solution, if only possible: somehow make pagination work on the 404 template - since I have no idea why it doesn't already, I don't know if it can be done or how.

It's hard to say why pagination isn't working without seeing the code for your 404 template.

The complicated solution, if only feasible: Use the search template itself.

You can use the template_include filter to change the template. You'll also have to manually change the main query to a search query:

add_filter('template_include', function($template) {
   if(!is_404()) {
       return $template;
   }

    $search_query = new WP_Query(array('s' => get_query_var('name')));
    if($search_query->have_posts()) {
        // Replace the main query with the search query
        global $wp_query;
        $wp_query = $search_query;

        // Change the response code
        status_header(200);

        // Use the search template
        return get_search_template();
    }

    return $template;
});

Note that under normal circumstances, the best practice for modifying the main query is to use the pre_get_posts filter. In this case however, we don't know whether or not this is a 404 until after the query is executed.

Also, I'm using status_header to change the response code from 404 to 200 if the search returns results. If all you are trying to do is serve the right content to users then the response code probably doesn't matter.

In most cases, the relevant result shows up towards the top, but not always

If you decide that you do want to just serve the first result of the search, you can update the above code to redirect:

if($search_query->have_posts()) {
    $url = get_permalink( $search_query->posts[0]->ID );
    wp_redirect($url);
    exit;
}

Update: Also, you could just redirect the request to a search without having to worry about modifying the 404 template or loading a different template:

if($search_query->have_posts()) {
    $url = get_search_link( get_query_var('name') );
    wp_redirect($url);
    exit;
}
3
votes

You could tweak your original idea:

In most cases, the relevant result shows up towards the top, but not always. For this reason, I suppose it's wrong to pretend that the 404 is simply not there.

You can redirect the user to a copy of your search page with some added messaging along the lines of "This page has moved, is it one of these?" (Or, even better, dynamically add that messaging to your standard search page if the user was redirected).

Depending on how your search is set up, you can send over the original URL as a php POST variable to run the search, or parse it on the 404 page and send it as a series of GET variables.

Or am I misunderstanding some limitation in parsing your URL and submitting it in the wordpress search?