5
votes

I have a rather huge (30 mln rows, up to 5–100Kb each) Table on Azure.
Each RowKey is a Guid and PartitionKey is a first Guid part, for example:

PartitionKey = "1bbe3d4b"
RowKey = "1bbe3d4b-2230-4b4f-8f5f-fe5fe1d4d006"

Table has 600 reads and 600 writes (updates) per second with an average latency of 60ms. All queries use both PartitionKey and RowKey.
BUT, some reads take up to 3000ms (!). In average, >1% of all reads take more than 500ms and there's no correlation with entity size (100Kb row may be returned in 25ms and 10Kb one – in 1500ms).

My application is an ASP.Net MVC 4 web-site running on 4-5 Large instances.

I have read all MSDN articles regarding Azure Table Storage performance goals and already did the following:

  • UseNagle is turned Off
  • Expect100Continue is also disabled
  • MaxConnections for table client is set to 250 (setting 1000–5000 doesn't make any sense)

Also I checked that:

  • Storage account monitoring counters have no throttling errors
  • There are some kind of "waves" in performance, though they does not depend on load

What could be the reason of such performance issues and how to improve it?

2
Is your storage account located in the same region as your website? - Zain Rizvi
For a given PartitionKey, approximately how many rows do you have? For the 600 reads and 600 writes, are these occurring within the same PartitionKey or is it spread across multiple partitions? - Rick Rainey
@zain-rizvi, yes, of cause, between regions I won't be able to get 60ms in average. - Alexander Shvetsov
@rick-rainey, reads/writes are randomly spread between partitions. Each partition has 1-5 rows, and not more than 1-5 reads per second. - Alexander Shvetsov

2 Answers

1
votes

I use the MergeOption.NoTracking setting on the DataServiceContext.MergeOption property for extra performance if I have no intention of updating the entity anytime soon. Here is an example:

var account = CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("DataConnectionString"));
var tableStorageServiceContext = new AzureTableStorageServiceContext(account.TableEndpoint.ToString(), account.Credentials);
tableStorageServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1));
tableStorageServiceContext.MergeOption = MergeOption.NoTracking;
tableStorageServiceContext.AddObject(AzureTableStorageServiceContext.CloudLogEntityName, newItem);
tableStorageServiceContext.SaveChangesWithRetries();

Another problem might be that you are retrieving the entire enity with all its properties even though you intend only use one or two properties - this is of course wasteful but can't be easily avoided. However, If you use Slazure then you can use query projections to only retrieve the entity properties that you are interested in from the table storage and nothing more, which would give you better query performance. Here is an example:

using SysSurge.Slazure;
using SysSurge.Slazure.Linq;
using SysSurge.Slazure.Linq.QueryParser;

namespace TableOperations
{
    public class MemberInfo
    {
        public string GetRichMembers()
        {
            // Get a reference to the table storage
            dynamic storage = new QueryableStorage<DynEntity>("UseDevelopmentStorage=true");

            // Build table query and make sure it only return members that earn more than $60k/yr
            // by using a "Where" query filter, and make sure that only the "Name" and 
            // "Salary" entity properties are retrieved from the table storage to make the
            // query quicker.   
            QueryableTable<DynEntity> membersTable = storage.WebsiteMembers;
            var memberQuery = membersTable.Where("Salary > 60000").Select("new(Name, Salary)");

            var result = "";

            // Cast the query result to a dynamic so that we can get access its dynamic properties
            foreach (dynamic member in memberQuery)
            {
                // Show some information about the member
                result += "LINQ query result: Name=" + member.Name + ", Salary=" + member.Salary + "<br>";
            }

            return result;
        }
    }
}

Full disclosure: I coded Slazure.

You could also consider pagination if you are retrieving large data sets, example:

// Retrieve 50 members but also skip the first 50 members
var memberQuery = membersTable.Where("Salary > 60000").Take(50).Skip(50);
0
votes

Typically, if a specific query requires scanning a large number of rows, that will take longer time. Is the behavior you are seeing specific a query / data? Or, are you seeing the performance varies for the same data and query?