2
votes

I’m playing around with MongoDB for the moment to see what nice features it has. I’ve created a small test suite representing a simple blog system with posts, authors and comments, very basic.

I’ve experimented with a search function which uses the MongoRegEx class (PHP Driver), where I’m just searching through all post content and post titles after the sentence ‘lorem ipsum’ with case sensitive on “/I”.

My code looks like this:

$regex = new MongoRegEx('/lorem ipsum/i');  
$query = array('post' => $regex, 'post_title' => $regex);

But I’m confused and stunned about what happens. I check every query for running time (set microtime before and after the query and get the time with 15 decimals).

For my first test I’ve added 110.000 blog documents and 5000 authors, everything randomly generated. When I do my search, it finds 6824 posts with the sentence “lorem ipsum” and it takes 0.000057935714722 seconds to do the search. And this is after I’ve reset the MongoDB service (using Windows) and this is without any index other than the default on _id.

MongoDB uses a B-tree index, which most definitely isn’t very efficient for a full text search. If I create an index on my post content attribute, the same query as above runs in 0.000150918960571, which funny enough is slower than without any index (slower with a factor of 0.000092983245849). Now this can happen for several reasons because it uses a B-tree cursor.

But I’ve tried to search for an explanation as to how it can query it so fast. I guess that it probably keeps everything in my RAM (I’ve got 4GB and the database is about 500MB). This is why I try to restart the mongodb service to get a full result.

Can anyone with experience with MongoDB help me understand what is going on with this kind of full text search with or without index and definitely without an inverted index?

Sincerely - Mestika

1
mongodb regex doesn't use indexes unless it's a "starts with" type regex--I've had some success breaking out all of the terms into an array, and indexing on that. I plan to migrate that solution to Elastic Search (for the full text search portion), while keeping everything in mongo for other types of queries. Yes, it will keep your data in RAM if it's being accessed and it has free memory.Eve Freeman

1 Answers

4
votes

I think you simply didn't iterate over the results? With just a find(), the driver will not send a query to the server. You need to fetch at least one result for that. I don't believe MongoDB is this fast, and I believe your error to be in your benchmark.

As a second thing, for regular expression search that is not anchored at the beginning of the field's value with an ^, no index is used at all. You should play with explain() to see what is actually happening.