First of all, I want to make it clear that I know what best practices are when it comes to handling 404 errors. However, I have this specific case where I may need a tailored approach.
I'm handling a newspaper site that has more than 10 years' worth of archives, with 150k+ hard-worked pieces of content and loads of links that still get clicked through. It also went through a lot of trouble: 3 different CMS's before WP, each with it's own link structure and unproper redirecting upon every change. So now the archives are all but "lost" from a SEO point of view.
With more than 90% content misplaced, showing classical 404's is not really an option. The emergency exit was to redirect the words in the URL to a search query (after filtering out the constants) and hope for the best. In most cases, the relevant result shows up towards the top, but not always. For this reason, I suppose it's wrong to pretend that the 404 is simply not there.
The other approach I thought of was this: keep the URL verbatim, send the 404 status, but use the 404 template to show a search query (WP_query
with 's' parameter) on the relevant words.
This has the advantage that, on strong matches (those that are all but certain to be the "I'm feeling lucky"), I can decide to force an actual 301 redirect. That's not always the case, though: sometimes the actual wanted article is very far down the list. Still, it would work almost fine, except that for some reason pagination is not working on 404s. So now I think one of two things need to be done:
The simple solution, if only possible: somehow make pagination work on the 404 template - since I have no idea why it doesn't already, I don't know if it can be done or how. (Update: most likely it is because the pagination query var/slug is treated as part of the search)
The complicated solution, if only feasible: Use the search template itself. The 'search' slug can be removed completely by hooking into rewrite rules with a
$wp_rewrite->search_base = '';
This theoretically turns almost any url thrown at it into a search. The huge problem is that it also does it for postnames and everything else except categories and tags. So what I get from this is the following: Whenever there's a URL request, Wordpress will look if there is a category matching, then a tag, then it will do a search. Only after that will it look for matching authors, archives, posts etc. If only I could somehow hook into wordpress's internal rules regarding url parsing priority and move the search thing to the end of the list, the problem would be solved.
I'll have to admit that I didn't try any actual code. I don't know where to start from, I don't know exactly what to search for and there also seems to be little documentation for what I want. All I was able to do so far was blind test, as described above.
So the question is if there's any way to do either of the above and how.