I am playing with Azure Table Storage to get my head around it as a platform for a Big Data Project. As test data I am using the UK Companies House data feed to provide me with 3.5 million UK company records. I have successfully created an Entity and loaded a ton of the entries into my table "Companies" but think that this model won't work for me due to my "search" or query requirements being on more than just the PartitionKey and RowKey.
My Company class looks like this:
public class Company : TableEntity
{
public Company() { }
public Company(string companyNumnber)
{
CompanyNumber = companyNumnber;
this.PartitionKey = "Company";
this.RowKey = companyNumnber;
RegAddress = new Address();
}
public String CompanyNumber { get; set; }
public Address RegAddress { get; set; }
public String CompanyName { get; set; }
public String PhoneNumber { get; set; }
public String CompanyCategory { get; set; }
public String CompanyStatus { get; set; }
public String CountryOfOrigin { get; set; }
public String DissolutionDate { get; set; }
public String IncorporationDate { get; set; }
public String URI { get; set; }
}
public class Address
{
public Address() { }
public String CareOf { get; set; }
public String POBox { get; set; }
public String AddressLine1 { get; set; }
public String AddressLine2 { get; set; }
public String PostTown { get; set; }
public String County { get; set; }
public String Country { get; set; }
public String PostCode { get; set; }
}
This is all fine and I can load the entities into Table Storage and I can query the table based on CompanyNumber, but I am going to need to also query the table on CompanyName and that is NOT going to be limited to a eq "equals" operator... I NEED to be able to do a "contains" on that property and it looks like that is not possible.
Am I right on that? If that is the case what is the work around? It seems to me that I am "stuck" with hosting these 3.5 million records in SQL and then use Table Storage to store "related" content with PartitionKey company number and then do a lookup that way.
I was hoping it would keep costs (of SQL Azure) down, but it seems like I might need a hybrid approach.
I would love to hear others comments on this.