0
votes

I am looking to create URL shortening service for a project. This will be a very basic service that requires storing the following:

  1. A random 6 character ([a-z][A-Z][0-9]) short string (unique). The id/key.
  2. The long URL. The value.

I have chosen to use Azure Table Storage for this service since it works with our stack.

When a request comes in with the short string (key), I simply need to look up the entity and return the long URL (value).

This is just basic key/value storage. Since ATS requires a partition AND row key to define an entity, I am trying to think of the best strategy for partition and row keys.

So far I have come up with the following options:

  1. Use the short string as the partition key. Empty (no) row key.
  2. Use the first letter of the short string to define the partition, then use the remaining 5 characters as the row key. This creates a maximum number of 62 partitions that would theoretically distribute the entities evenly across all the partitions.

  • Are either of these two approaches a good idea?
  • What are the pros and cons of each?
  • Is there a better approach for simple key/value pair storage?
2

2 Answers

1
votes

I'd go with option 1.

Use the shortened URL as the partition key. This lets the machines behind the service decide how to best partition the keys you provide. You don't really need a row key so it can stay empty or, if you don't mind overloading row key, place the long URL there.

All rows with the same partition key have to be handled by the same machine so the table service's options for load balancing are restricted when you provide fewer partition keys. Also, if you use a common partition key shared by many rows (for example the first letter of the short URLs) then the backing server may have to read all rows with that partition key, and filter for your row key to return your target row.

There are advantages to using common partition keys when you're doing transactions or when writing data fast is more important than reading it fast.

If I were doing the same thing using Azure Tables, I'd use the same scheme you outlined in option 1.

1
votes

Thinkable has a good summary. We also have a blog post about how to get the most out of Azure Tables, http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx. Look under the “Scalability” section to see a discussion that’s directly relevant to your scenario.