I'm setting up Sphinx to search a database a lot like what I imagine is at the heart of Stack Overflow -- a posts table, and a tags table. Each tag has a post_id to identify the post with which it's associated. The site is built around CakePHP 2.2.3.
What I want is to be able to search for topics. Maybe the topic name occurs in the text of the post, or maybe in the tags associated with it (sometimes users are lazy about tagging).
My naive implementation (first time using Sphinx, and first time getting my hands dirty with MySQL) has been to set up Sphinx with a query like the below and search the associated index:
SELECT posts.id,
posts.text,
tags_concat.all_tags
FROM posts
LEFT JOIN(SELECT tags.post_id,
GROUP_CONCAT(tags.name) AS all_tags
FROM tags
GROUP BY tags.post_id) AS tags_concat
ON tags_concat.post_id = posts.id
WHERE posts.id >= $start
AND posts.id <= $end
I've used what seems to be the ubiquitous Sphinx behaviour to do this, and the results are awesome! However, the database is big enough that I'll need to use a delta index to keep reasonably up-to-date search results.
Most of the delta schemes I've seen might (for example) track the highest posts.id in the main index, and place anything above that in the delta index. If a user edits a post, or adds or edits a tag, however, it could be a week before that's reflected in either index, since it won't result in a new row in "posts". Unlike Stack Overflow, much of my tagging will occur after a post is made, and posts will absolutely be edited after they're created (they're built by the community), so this is a significant issue.
Am I missing something in the implementation of the delta index, or an alternative to delta indexes, or a tweak to the implementation that might make searches more reliable?
Edit
Thinking on this some more, I realize that the delta index should still be updated if the post or tags are edited after post creation, with one exception -- posts created before the main index is rebuilt but edited afterward (even if there's only an hour between those two events) won't show in the delta. Can I flag posts when they're changed (with a posts.in_delta column, for example), then somehow have the results from the delta index override those from the main index?