2
votes

I have an Azure Stream Analytics job that uses an EventHub and a Reference data in Blob storage as 2 inputs. The reference data is CSV that looks something like this:

REGEX_PATTERN,FRIENDLY_NAME
115[1-2]{1}9,Name 1
115[3-9]{1}9,Name 2

I then need to lookup an attribute in the incoming event in EventHub against this CSV to get the FRIENDLY_NAME.

Typical way of of using reference data is using JOIN clause. But in this case I cannot use it because such regex matching is not supported with LIKE operator.

UDF is another option, but I cannot seem to find a way of using reference data as a CSV inside the function.

Is there any other way of doing this in an Azure Stream Analytics job?

1
What is the join key between event hub and reference data? REGEX_PATTERN?Jay Gong
@JayGong yes, it is the join key.user_name
Understand,but i think the key should be specific, not regex value in ASA job. So it is your pain point, right?Jay Gong
@JayGong Correct. Which is why I doubt a JOIN would work in my case. The alternative I could think was, was using an UDF, but was wondering if I can use the entire list of reference data in the UDF to perform the lookup.user_name
As a last fallback I am thinking of just hardcoding the reference data in the UDF. But I need this working for several different ASA jobs that use different reference data. So things will get ugly there.user_name

1 Answers

1
votes

As I know, the JOIN is not supported in your scenario. The join key should be specific, can't be a regex value.

Thus, reference data is not suitable here because it should be used in the ASA sql like below:

SELECT I1.EntryTime, I1.LicensePlate, I1.TollId, R.RegistrationId  
FROM Input1 I1 TIMESTAMP BY EntryTime  
JOIN Registration R  
ON I1.LicensePlate = R.LicensePlate  
WHERE R.Expired = '1'

The join key is needed. What I mean is that the reference data input is not needed even here.

Your idea is using UDF script and load the data in the UDF to compare with the hardcode regex data. This idea is not easy to maintain. Maybe you could consider my workaround:

1.You said you have different reference data,please group them and store as json array. Assign one group id to every group. For example:

Group Id 1:

[
    {
        "REGEX":"115[1-2]{1}9",
        "FRIENDLY_NAME":"Name 1"
    },
    {
        "REGEX":"115[3-9]{1}9",
        "FRIENDLY_NAME":"Name 2"
    }
]

....

2.Add one column to referring group id and set Azure Function as Output of your ASA SQL. Inside Azure Function, please accept the group id column and load the corresponding group of json array. Then loop the rows to match the regex and save the data into destination residence.

I think Azure Function is more flexible then UDF in ASA sql job. Additional,this solution is maybe easier to maintain.