MongoDB: How can I order by distance considering multiple fields?

Question

I have a collection that stores information about doctors. Each doctor can work in private practices and/or in hospitals. The collection has the following relevant fields (there are geospatial indexes on both privatePractices.address.loc and hospitals.address.loc):

{
  "name" : "myName",
  "privatePractices" : [{
      "_id": 1,
      "address" : {
        "loc" : {
          "lng" : 2.1608502864837646,
          "lat" : 41.3943977355957
        }
      }
    },
    ...
    ],
  "hospitals" : [{
      "_id": 5,
      "address" : {
        "loc" : {
          "lng" : 2.8192520141601562,
          "lat" : 41.97784423828125
        }
      }
    },
    ...
    ]
}

I am trying to query that collection to get a list of doctors ordered by distance from a given point. This is where I am stuck:

The following queries return a list of doctors ordered by distance to the point defined in $nearSphere, considering only one of the two location types:

{ "hospitals.address.loc" : { "$nearSphere" : [2.1933, 41.4008] } }
{ "privatePractices.address.loc" : { "$nearSphere" : [2.1933, 41.4008] } }

What I want is to get the doctors ordered by the nearest hospital OR private practice, whatever is the nearest. Is it possible to do it on a single Mongo query?

Plan B is to use the queries above and then manually order the results outside Mongo (eg. using Linq). To do this, my two queries should return the distance of each hospital or private practice to the $nearSphere point. Is it possible to do that in Mongo?

EDIT - APPLIED SOLUTION (MongoDB 2.6): I took my own approach inspired by what Neil Lunn suggests in his answer: I added a field in the Doctor document for sorting purposes, containing an array with all the locations of the doctor.

I tried this approach in MongoDB 2.4 and MongoDB 2.6, and the results are different. Queries on 2.4 returned duplicate doctors that had more than a location, even if the _id was included in the query filter. Queries on 2.6 returned valid results.

Modelling is the problem here as this one wont cut it for your intended purpose. Perhaps the best way to answer is to actually know what you mean by "get the doctors". Is there any specific selection criteria to be applied to those doctors? Such as a specific "specialist type" or something else. I'm generally thinking Plan C here which will perform a drastically better than your Plan B — Neil Lunn
I have certain scenarios where I query with other criterias, but I usually also want the results ordered by proximity. I am intrigued by your Plan C - what would that be? — ederbf
@ederbf did you use "Multipoint" indexes or just "Point"? Can you share your final document schema? — Zyoo
@Zyoo, what do you mean by "Multipoint"? The only spherical index I know is '2dsphere', and that's what I used. My solution ended up being very similar to what Neil Lunn suggests in his answer, his schema is a good example :) — ederbf
I meant the geojson type.. I finally used "MultiPoint" to save multiple locations and then query by "Point". Similar to this stackoverflow.com/questions/31295049/… — Zyoo

Neil Lunn Neil Lunn · Accepted Answer · 2014-07-10T10:07:58

I would have been hoping for a little more information here, but the basics still apply. So the general problem you have stumbled on is trying to have "two" location fields on what appears to be your doctors documents.

There is another problem with the approach. You have the "locations" within arrays in your document/ This would not give you an error when creating the index, but it also is not going to work like you expect. The big problem here is that being within an array, you might find the document that "contains" the nearest location, but then the question is "which one", as nothing is done to affect the array content.

Core problem though is you cannot have more than one geo-spatial index per query. But to really get what you want, turn the problem on it's head, and essentially attach the doctors to the locations, which is the other way around.

For example here, a "practices" collection or such:

{
    "type": "Hospital",
    "address" : {
        "loc" : {
          "lng" : 2.8192520141601562,
          "lat" : 41.97784423828125
        }
    },
    "doctors": [
        { "_id": 1, "name": "doc1", "specialty": "bones" },
        { "_id": 2, "name": "doc2", "specialty": "heart" }       
    ]
}

{
    "type": "Private",
    "address" : {
       "loc" : {
          "lng" : 2.1608502864837646,
          "lat" : 41.3943977355957
       }
    },
    "doctors": [
        { "_id": 1, "name": "doc1", "specialty": "bones" },
        { "_id": 3, "name": "doc3", "specialty": "brain" }
    ]
}

The advantage here is that you have here is that as a single collection and all in the same index you can simply get both "types" and correctly ordered by distance or within bounds or whatever your geo-queries need be. This avoids the problems with the other modelling form.

As for the "doctors" information, of course you actually keep a separate collection for the full doctor information themselves, and possibly even keep an array of the _id values for the location documents there as well. But the main point here is that you can generally live with "embedding" some of the useful search information in a collection here that will help you.

That seems to be the better option here, and matching a doctor to criteria from inside the location is something that can be done, where as finding or sorting the nearest entry inside array is something that is not going to be supported by MongoDB itself, and would result in you applying the Math yourself in processing the results.

MongoDB: How can I order by distance considering multiple fields?

1 Answers