3
votes

EDIT question summary:

  • I want to expose an endpoints, that will be capable of returning portions of xml data by some query parameters.
  • I have a statefull service (that is keeping the converted to DTOs xml data into a reliable dictionary)
  • I use a single, named partition (I just cant tell which partition holds the data by the query parameters passed, so I cant implement some smarter partitioning strategy)
  • I am using service remoting for communication between the stateless WEBAPI service and the statefull one
  • XML data may reach 500 MB
  • Everything is OK when the XML only around 50 MB
  • When data gets larger I Service Fabric complaining about MaxReplicationMessageSize

and the summary of my few questions from below: how can one achieve storing large amount of data into a reliable dictionary?

TL DR;

Apparently, I am missing something...

  • I want to parse, and load into a reliable dictionary huge XMLs for later queries over them.
  • I am using a single, named partition.
  • I have a XMLData stateful service that is loading this xmls into a reliable dictionary in its RunAsync method via this peace of code:

    var myDictionary = await this.StateManager.GetOrAddAsync<IReliableDictionary<string, List<HospitalData>>>("DATA");
        using (var tx = this.StateManager.CreateTransaction())
        {
            var result = await myDictionary.TryGetValueAsync(tx, "data");
    
            ServiceEventSource.Current.ServiceMessage(this, "data status: {0}",
                    result.HasValue ? "loaded" : "not loaded yet, starts loading");
    
            if (!result.HasValue)
            {
                Stopwatch timer = new Stopwatch();
                timer.Start();
    
                var converter = new DataConverter(XmlFolder);
                List <Data> data = converter.LoadData();
                await myDictionary.AddOrUpdateAsync(tx, "data", data, (key, value) => data);
    
                timer.Stop();
                ServiceEventSource.Current.ServiceMessage(this,
                        string.Format("Loading of data finished in {0} ms",
                        timer.ElapsedMilliseconds));
            }
            await tx.CommitAsync();
        }
    
  • I have a stateless WebApi service that is communicating with the above stateful one via service remoting and querying the dictionary via this code:

    ServiceUriBuilder builder = new ServiceUriBuilder(DataServiceName);
    DataService DataServiceClient = ServiceProxy.Create<IDataService>(builder.ToUri(),
        new Microsoft.ServiceFabric.Services.Client.ServicePartitionKey("My.single.named.partition"));
    try
    {
        var data = await DataServiceClient.QueryData(SomeQuery);
        return Ok(data);
    }
    catch (Exception ex)
    {
        ServiceEventSource.Current.Message("Web Service: Exception: {0}", ex);
        throw;
    }
    
  • It works really well when the XMLs do not exceeds 50 MB.

  • After that I get errors like:

System.Fabric.FabricReplicationOperationTooLargeException: The replication operation is larger than the configured limit - MaxReplicationMessageSize ---> System.Runtime.InteropServices.COMException

Questions:

  • I am almost certain that it is about the partitioning strategy and I need to use more partitions. But how to reference a particular partition while in the context of the RunAsync method of the Stateful Service? (Stateful service, is invoked via the RPC in WebApi where I explicitly point out a partition, so in there I can easily chose among partitions if using the Ranged partitions strategy - but how to do that while the initial loading of data when in the Run Async method)
  • Are these thoughts of mine correct: the code in a stateful service is operating on a single partition, thus Loading of huge amount of data and the partitioning of that data should happen outside the stateful service (like in an Actor). Then, after determining the partition key I just invoke the stateful service via RPC and pointing it to this particular partition

  • Actually is it at all a partitioning problem and what (where, who) is defining the Size of a Replication Message? I.e is the partiotioning strategy influencing the Replication Message sizes?

  • Would excerpting the loading logic into a stateful Actor help in any way?

For any help on this - thanks a lot!

1

1 Answers

4
votes

The issue is that you're trying to add a large amount of data into a single dictionary record. When Service Fabric tries to replicate that data to other replicas of the service, it encounters a quota of the replicator, MaxReplicationMessageSize, which indeed defaults to 50MB (documented here).

You can increase the quota by specifying a ReliableStateManagerConfiguration:

internal sealed class Stateful1 : StatefulService
{
    public Stateful1(StatefulServiceContext context)
        : base(context, new ReliableStateManager(context,
            new ReliableStateManagerConfiguration(new ReliableStateManagerReplicatorSettings
            {
                MaxReplicationMessageSize = 1024 * 1024 * 200
            }))) { }
}

But I strongly suggest you change the way you store your data. The current method won't scale very well and isn't the way Reliable Collections were meant to be used.

Instead, you should store each HospitalData in a separate dictionary item. Then you can query the items in the dictionary (see this answer for details on how to use LINQ). You will not need to change the above quota.

PS - You don't necessarily have to use partitioning for 500MB of data. But regarding your question - you could use partitions even if you can't derive the key from the query, simply by querying all partitions and then combining the data.