3
votes

I have a very large database (~ 6 million rows) which was imported from a csv file. I was looking at MongoDB and their documentation on GeoSpatial indexing. You need the latitude and longitude fields embedded into an array. Unfortunately, the structure I have imported has the fields seperate.

"latitude" : 54.770233, "longitude" : -6.537741,

Is there any possible way of querying the DB to calculate the twenty nearest documents within a 25 miles radius using the latitude point of 54 and longitude of -6.

I created the query in MySQL fine :

SELECT * , ( 3959 * ACOS( COS( RADIANS( 54 ) ) * COS( RADIANS( latitude ) ) * COS( RADIANS( longitude ) - RADIANS( -6 ) ) + SIN( RADIANS( 54 ) ) * SIN( RADIANS( latitude ) ) ) ) AS distance FROM table_name HAVING distance <25 ORDER BY distance LIMIT 0 , 20

However, I am unsure of how to do this in Mongo without the embedded fields.

Any help would be much appreciated.

1

1 Answers

3
votes

I don't think you can. I would run a script to go through and update all the documents, which will take a while but not too long. Then you can create a 2d index as per usual.

> db.test.insert({latitude:73,longitude:-47});
> db.test.insert({latitude:20,longitude:-30});
> db.test.find().forEach(function(doc) {
  doc.loc = [doc.longitude, doc.latitude];
  db.test.save(doc);
});
> db.test.find();
{ "_id" : ObjectId("4f7c63f117cd93783bba936d"), "latitude" : 73, "longitude" : -47, "loc" : [ 73, -47 ] }
{ "_id" : ObjectId("4f7c63f817cd93783bba936e"), "latitude" : 20, "longitude" : -30, "loc" : [ 20, -30 ] }

Updated Actually, I suppose you could do it with a where clause, but it wouldn't use any indexes. But if it's for a one off query, it would probably be ok.

db.test.find("( 3959 * Math.acos(Math.cos( 54 * 0.0174532925 ) * Math.cos( this.latitude * 0.0174532925 ) * Math.cos( this.longitude * 0.0174532925 ) - (-6 * 0.0174532925)) + Math.sin( 54 * 0.0174532925 ) * Math.sin( this.latitude * 0.0174532925 )) > 25 ");

This doesn't actually work--the result of the calculation is too large. I was just trying to copy your math, but something must have not been right. In any case, sorting by distance is going to be a problem as well. I think the first solution is a bit easier to work with.