How performant are documentdb string range queries expected to be?

Question

I am trying to use documentdb for range searches on IPv4 and IPv6 network allocation data. Since the v6 numbers are so large, I am converting all addresses to fixed-length hex strings and using range searches. I have a standard 250GB partitioned collection, with around 20GB consumed. I have ~2.5 million documents. My request units are currently set to 20,100.

My problem is that range searches are terribly slow on this collection (20 to 40 seconds for some values), even when I set MaxDegreeOfParallelism to 25. I am using a range search to look up a single IP address. Here is the indexing policy I am using:

    {
  "indexingMode": "lazy",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/ipv4hexfrom/?",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "String",
          "precision": 8
        }
      ]
    },
    {
      "path": "/ipv4hexto/?",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "String",
          "precision": 8
        }
      ]
    },
    {
      "path": "/ipv6hexfrom/?",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "String",
          "precision": 32
        }
      ]
    },
    {
      "path": "/ipv6hexto/?",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "String",
          "precision": 32
        }
      ]
    }
  ],
  "excludedPaths": [
    {
      "path": "/*"
    }
  ]
  }

Here is an example query:

SELECT * FROM c
WHERE c.ipv4hexfrom <= "cfd95803" AND c.ipv4hexto >= "cfd95803"

In addition to the string indexes, documentdb is automatically adding Number indexes to the policy I applied, presumably because it sees things that look like numbers. As a result, if I query for an IP whose hex value is within the decimal range, the lookup time is a second or two. Other values appear to use the string indexes and are very slow.

I am using the latest 1.12 SDK version.

I could treat the IPv4 entries as straight integers to get better performance, but then the IPv6 lookups would still be slow. I would rather avoid having to split the IPv6 address into multiple numeric fields to get the performance I need.

Worth asking MS support or on the product forums? You can also tweet Azure support, they might be able to direct you to the best place to ask — Kieren Johnstone

Matthew Hintzen Matthew Hintzen · Accepted Answer · 2018-03-22T15:06:57

The less precise the precision, the slower the index will be in performance. Since you are doing <= and >= comparisons you really want to make sure you have set the precision to the maximum amount.

More information is here: docs.microsoft.com "index precision"

So like this.

{
  "indexingMode": "lazy",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/ipv4hexfrom/?",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "String",
          "precision": -1
        }
      ]
    },
    {
      "path": "/ipv4hexto/?",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "String",
          "precision": -1
        }
      ]
    },
    {
      "path": "/ipv6hexfrom/?",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "String",
          "precision": -1
        }
      ]
    },
    {
      "path": "/ipv6hexto/?",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "String",
          "precision": -1
        }
      ]
    }
  ],
  "excludedPaths": [
    {
      "path": "/*"
    }
  ]
}

How performant are documentdb string range queries expected to be?

1 Answers