Suppose I have data in a file that represents a car dealers current (huge) inventory of unique (by some criterium) cars, and a new file is generated daily reflecting the inventory and price etc. of each car.
Parsing the file results in a list of Car
objects.
The uniqueness of each car is represented in the Car
object by some value, that would be the basis of a unique key in a regular rdbms setup.
I want the data in CosmosDB to be a queryable version of the data in the file. It should only hold data from the latest parsed file, and not data from previous files.
I have an Azure function that parses the file when it is uploaded to a blob storage, and inserts the parsed data into CosmosDB. However, this insertion will just grow the database over time with worthless data.
How can I UPSERT the parsed
Car
objects instead of always inserting them?Can the upsert be declared as part azure function instead of
ICollector<Car>
?
I guess that CosmosDB could be used as an input to the Azure function and comparing the Car
objects there with the ones in the file and updating/inserting when necessary, but I would prefer if CosmosDB or Azure function has a neat way of achieving this.
The Azure function:
[FunctionName("ParseCarsFromFile")]
public static void Run(
[BlobTrigger("data", Connection = "StorageConnection")] TextReader textReader,
[CosmosDB("data", "car", ConnectionStringSetting ="CosmosDb", CreateIfNotExists = true)] ICollector<Car> documentOutputBinding)
{
var cars = CarsFileParser.Parse(textReader);
foreach (var car in cars)
{
documentOutputBinding.Add(car);
}
}