I have data flowing from Event hub to Azure data explorer. We configured a main table which holds all the data. Upon ingestion to that main table we create few sub tables which has partial data of the main table. For example.
Event hub we receive the following JSON data
{
TIME:'T'
data: [
{
"ABC" : 1234,
"DEF" : 1235,
"GHI" : 1236
},
{
"ABC" : 1223,
"DEF" : 1236,
"GHI" : 12
},
{
"ABC" : 2213,
"DEF" : 1235,
"GHI" : 1236
}]
}
Upon ingesting this data into main Raw table, we have few ingestion time policies which will expand this JSON and ingest into sub tables in the below table format.
| TIMEINUTC | ABC |
|---|---|
| T | 1234 |
| T | 1223 |
| T | 2213 |
Similarly other tables as follows.
| TIMEINUTC | DEF |
|---|---|
| T | 1235 |
| T | 1236 |
| T | 2213 |
| TIMEINUTC | GHI |
|---|---|
| T | 1236 |
| T | 12 |
| T | 1236 |
Now I can run my analytical jobs/ Kusto queries based on this small tables.
The problem is, I want to make some conditional insertion to another table. For example. Say the value of ABC is 1234 And inserted into the ABC sub table. If I receive ABC value as 1234 again after some time I don't want to insert into the sub table.
- Is it a good approach to write such a logic in the ingestion time policy?
- If such ingestion time policy takes some time to execute will it affect the ingestion of RawData from Event hub to the main Raw table?
- Should I go ahead with some scheduler approach in periodical interval from ADF/Logic apps?