0
votes

When I do a query to my database with python-cloudant using 3 selectors, the query takes ~15 seconds to complete and print results. When I put a limit of 2 in (see code below), the result appears very quickly, but after 2 takes much longer. There are currently ~190,000 documents in this database. Am I missing something that can speed this up?

query = cloudant.query.Query(db,selector={'_id': {'$gt': 0},'userid':{'$eq':'56900'},'year':{'$eq':'[2011]'}},fields=['_id','userid','year'],sort=['_id'],limit=2)
for doc in query()['docs']:
    print doc

(The goal of this query is to get all records from the userid "56900" that contain "[2011]" in the year field and sort them by the _id)

1
Do you have indexes set up for the userId field? If not chances are you are forcing the DB to do a full scan over all the documents. - rhyshort
@rhyshort Maybe not correctly. I added the following but it does not speed up the query: "json: userid, year" { "type": "json", "def": { "fields": [ { "userid": "asc" }, { "year": "asc" } - pynewb
Maybe try removing the '_id': {'$gt': 0} from your query, iirc that would trigger a full scan since your looking at the _id field, which all docs have - rhyshort
@rhyshort that worked, now it is very speedy, thank you! - pynewb
Cool, I'll just write it up as an answer to make it easier for others :) - rhyshort

1 Answers

1
votes

As per the comments on the question. The reason why it is slow, is that the '_id': {'$gt': 0} part of the selector is causing all documents to be scanned to see if that match the rest of the selector. Removing this clause and having the fields userid and year indexed have a massive increase.