4
votes

I have a reliable dictionary in service fabric stateful service. I have a simple linq expression.
I am using Ix-Async package for building an asyncenumerable.


using (ITransaction tx = this.StateManager.CreateTransaction())  
        {  

          var result = (await customers.CreateLinqAsyncEnumerable(tx))
                .Where(x => x.Value.NameFirst != null && x.Value.NameFirst.EndsWith(n, StringComparison.InvariantCultureIgnoreCase))
                    .Select(y => y.Value);

           return await result.ToList();


        }  

The data is organized into 2 partitions with around 75,000 records in each partition. I am using Int64 range as the partition key. In the above code, the "Result.ToList()" takes around 1 minute to execute for each partition. Another weired thing is, the actual result is empty!. The same sql run in sql server returns rows with customer first names ending with "c". But, this is besides the point. My biggest concern is performance of "ReliableDictionary" linq query.
Regards

1
How big are the records? What hardware are you running it on? Did you take the measurement on a local dev machine, or on a real cluster? Any other services on the same machines? Have you tried enumerating without using the Ix-Async package to see if there's any difference?Vaclav Turecek
The table is a standard one except that it has 1 binary column [Picture]. This is a local dev cluster. Have 16 GB of RAM. Inserts to the dictionary are blazing fast. [around 2000 records per minute]. Dictionary Look ups by key is also very fast. I did enumerate the entire 75,000 records using the asyncenumerator and applied the predicate. This still took around 50 seconds.teeboy
I wouldn't recommend storing pictures in there if you can help it. Usually you'd store a URL to an image file out in blob storage or something. How big are the pictures, roughly? It's likely they're not all kept in memory all the time. Basically what you're doing is pulling those pictures off of disk and into RAM during that enumeration. Does that enumeration time include any other processing on the picture data?Vaclav Turecek
Also if it's on a local dev machine, both partitions are sharing the same physical machine, so really you're pulling two sets of 75,000 concurrently.Vaclav Turecek
Also, out of curiosity, sql server has column indexes. Without attribute based indexes defined on serialized POCO's stored as reliable dictionaries, how can linq to object queries be quick? The internal query searching the dictionary must do a full collection scan, correct?teeboy

1 Answers

7
votes

Reliable Dictionary periodically removes least recently used values from memory. This is to enable

  • Large Reliable Dictionaries
  • Higher Density: Higher density of Reliable Collections per replica and higher density of replicas per node.

The trade-off is that, this can increase read latencies: disk IO is required to retrieve values that are not cached in-memory.

There are couple of options to get lower latency on enumerations.

1) Key Filtered Enumeration: You can move the fields that you would like to use in your query in to the TKey of the ReliableDictionary (NameFirst in the above example). This would allow you use the CreateEnumerbleAsync overload that takes in a key filter. The key filter allows Reliable Dictionary to avoid retrieving values from the disk for keys that do not match your query. One limitation of this approach is that TKey (hence the fields inside it) cannot be updated.

2) In-memory Secondary Index using Notifications: Reliable Dictionary Notifications can be used to build any number of secondary indices. You could build a secondary index that keeps all of the values in-memory hence trading memory resources to provide lower read latency. Furthermore, since you have full control over the secondary index, you can keep the secondary index ordered (e.g. by reverse of NameFirst in your example).

We are also considering making Reliable Dictionary's in-memory TValue sweep policy configurable. With this, you will be able to configure the Reliable Dictionary to keep all values in-memory if read latencies is a priority for you.

Since in your scenario most of the time in enumeration is spent on disk IO, you can also benefit from using your Custom Serializer which can reduce the disk and network footprint.

Thank you for your question.