DocumentDB is different from conventional databases in two ways - 1) it has a latency cap of 5 seconds for all requests since it's a cloud-based service based on HTTPS and REST, and 2) it's a database with provisioned throughput, so you get predictable performance (which is great) but have to execute queries within a reserved budget of resources.
This means that some queries can make incremental progress, and you have to resume execution by resubmitting the query with a continuation token until all results are available. For aggregation queries, DocumentDB works like "map-reduce" in that partial aggregate results are returned to the client, and the client is responsible for producing the final result (e.g. summing the aggregates). Normally, you wouldn't notice this behavior because queries complete in one round trip, but you would notice when the query requires a scan for execution (like in this cause because it involves a negation with the NOT IS_DEFINED clause).
If you run the query to completion, you will see the correct results returned.