I am trying to use documentdb for range searches on IPv4 and IPv6 network allocation data. Since the v6 numbers are so large, I am converting all addresses to fixed-length hex strings and using range searches. I have a standard 250GB partitioned collection, with around 20GB consumed. I have ~2.5 million documents. My request units are currently set to 20,100.
My problem is that range searches are terribly slow on this collection (20 to 40 seconds for some values), even when I set MaxDegreeOfParallelism to 25. I am using a range search to look up a single IP address. Here is the indexing policy I am using:
{
"indexingMode": "lazy",
"automatic": true,
"includedPaths": [
{
"path": "/ipv4hexfrom/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": 8
}
]
},
{
"path": "/ipv4hexto/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": 8
}
]
},
{
"path": "/ipv6hexfrom/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": 32
}
]
},
{
"path": "/ipv6hexto/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": 32
}
]
}
],
"excludedPaths": [
{
"path": "/*"
}
]
}
Here is an example query:
SELECT * FROM c
WHERE c.ipv4hexfrom <= "cfd95803" AND c.ipv4hexto >= "cfd95803"
In addition to the string indexes, documentdb is automatically adding Number indexes to the policy I applied, presumably because it sees things that look like numbers. As a result, if I query for an IP whose hex value is within the decimal range, the lookup time is a second or two. Other values appear to use the string indexes and are very slow.
I am using the latest 1.12 SDK version.
I could treat the IPv4 entries as straight integers to get better performance, but then the IPv6 lookups would still be slow. I would rather avoid having to split the IPv6 address into multiple numeric fields to get the performance I need.