2
votes

I am using azure data factory's copy activity to copy data from a csv file in blob to CosmosDB(with SQL API). In the Sink's linked service if I do not import any schema , my copy activity on execution reads headers from CSV and then saves the data in json form in cosmosDB. Till here it works fine.

I need to add a batch id column in the data being added in cosmosDB (batch id as GUID / pipelinerunID) so that I can track which all data in a set was copied as batch.

How can I keep all my source columns and add my batch id column in it and save it in my cosmos DB.

The schema is not fixed and can change on each adf pipeline trigger so cannot do import schema and do one o one column mapping in copy activity.

1
Hi, any progress now?Jay Gong
I see updating documents in cosmos a little costly process. also as I update each document as it is saved in db, I 'll not know which batch inserted which particular document. Thinking of put a custom batch activity or a azure function web activity in my pipeline to add that batch column in the csv itself, which while ingestion will carry batch id also to cosmos DBGvisgr8

1 Answers

0
votes

Per my knowledge, you can't add custom column when you transfer data from csv to cosmos db. I suggest you using Azure Function Cosmos DB Trigger to add batchId when the document created into database as workaround.

#r "Microsoft.Azure.Documents.Client"
#r "Newtonsoft.Json"
#r "Microsoft.Azure.DocumentDB.Core"
using System;
using System.Collections.Generic;
using Microsoft.Azure.Documents;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using Microsoft.Azure.Documents.Client;a

public static void Run(IReadOnlyList<Document> documents, TraceWriter log)
{
    if (documents != null && documents.Count > 0)
    {
        private static readonly string endpointUrl = "https://***.documents.azure.com:443/";
        private static readonly string authorizationKey = "***";
        private static readonly string databaseId = "db";
        private static readonly string collectionId = "coll";

        private static DocumentClient client;

        documents[0].SetPropertyValue("batchId","123");

        var document = client.ReplaceDocumentAsync(UriFactory.CreateDocumentUri(databaseId, collectionId, documents[0].id), documents[0]).Result.Resource;
        log.Verbose("document Id " + documents[0].Id);
    }
}

However, it seems that you need to specify the batchId by yourself which can't match the batchId in the azure data factory.

Hope it helps you.