1
votes

We want to setup data ingestion whenenever a new blob is uploaded into our Storage Account (via Event Grid > Event Hub route). The following page explains that metadata set on the blob can interfere with the ingestion: https://docs.microsoft.com/en-us/azure/data-explorer/ingest-data-event-grid-overview

3rd parties will upload BLOBs as part of external data ingest with an SAS token to their particular directory, where it would trigger an ingest.

What worries us if anybody messes with the kustoTable, kustoCreationTime, or kustoExtentTags metadata properties when uploading their daily blob, causing all sorts of issues.

Can the honoring of these metadata properties on the blobs be deactivated or the problem somehow mitigated?

1
Let me get this straight - you are not concerned that a 3rd party will write a blob into your storage account and that information will be picked up by your EG subscription and pollute your ADX table, and all that concerns you is that this 3rd party will be able to augment this pollution? - Vladik Branevich
To clarify: The uploaders are partners providing us with updates. If they send us weird data, we show them the same weird data on their very own UI. The data will be mapped by update policies through Dimension Tables to actual data tables and views. What I am worried is in general that they ingest data into one of the mapping tables, breaking the update policies, and having to make us restore them. Nevertheless: even if we block direct access to the blob storage, I don't want to give our own developers the "power" to reroute ingestion properties, from an architectural PoV. - der.Schtefan
OK, i understand. - Vladik Branevich

1 Answers

0
votes

The core requirement that led us to implement the handling of blob metadata properties was just the opposite - the desire to manage a single EG data connection that is able to "feed" multiple tables.

And hence the blob metadata has higher precedence.

In your case what you can do is create an Azure Function that will forcefully remove all undesired metadata properties from the blobs before renaming or copying them so that they are picked up by the EG.

Or if you go that way and put a function as the EG notification sink, your function can simply use our ingest SDK to submit the blob for ingestion using queued ingestion.