2
votes

I am trying to do a join in stream analytics with a reference data.

Below is the input from stream data.

[{
               "id":"111111101",
               "basetime":0,
               "xyz":
               [
                              {
                                             "xxx":1,
                                             "yyy":2631,
                                             "aaa":"470A01",
                                             "id":1
                              },
                              {
                                             "xxx":0,
                                             "yyy":0,
                                             "aaa":"000000",
                                             "id":61
                              }
               ]

},
{
               "id":"111111102",
               "basetime":0,
               "xyz":
               [
                              {
                                             "xxx":1,
                                             "yyy":2631,
                                             "aaa":"03F4EB",
                                             "id":1
                              }
               ]
},
{
               "id":"111111103",
               "basetime":0,
               "xyz":
               [
                              {
                                             "xxx":1,
                                             "yyy":2631,
                                             "aaa":"6706",
                                             "id":1
                              }
              ]
}
]

Below is the reference Master data.

[
{
    "aaa": "470A01"
  },
  {
    "aaa": "03F4EB"
  },
  {
    "aaa": "710211"
  }
]

The SAQL written is as shown below.

WITH INPUT1 AS (
    SELECT
    input.id.dateTime AS ID,
    flatArrayElement as ABC,
    FROM [signals2] as input
    CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement
    )

I have used CROSS APPLY to get each element in xyz as different rows.

The output of INPUT1 is as shown below.

+----------+------------------------------------------------------------------------+
|       ID |   ABC                                                                  |
+----------+------------------------------------------------------------------------+
| 111111101| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} |
| 111111101| {"ArrayValue":{"xxx":0,"yyy":0,"aaa":000000,"id":61},"ArrayIndex":1}   |
| 111111102| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":03F4EB,"id":1},"ArrayIndex":0} |
| 111111103| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":6706,"id":1},"ArrayIndex":0}   |
+-------------------+---------------------------------------------------------------+

Now I am trying to join the data xyz.aaa with a reference data as below, where master is the reference data.

SIGNALS AS (
 SELECT * FROM INPUT1 I JOIN master M ON I.ABC.ArrayValue.aaa = M.aaa

I am getting below output, but the problem is xyz with more than one element is duplicating in the output.

+-------------------------------+------------------------------------------------------------------------------------------+--------+
| i___timestamp                 |   i                                                                                      |  m     |     
+-------------------------------+------------------------------------------------------------------------------------------+--------+
| "2019-11-13T03:36:22.4636494Z"| "id": "111111101",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} | 470A01 |
| "2019-11-13T03:36:22.4636494Z"| "id": "111111101",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} | 470A01 |
| "2019-11-13T03:36:22.4636494Z"| "id": "111111102",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":03F4EB,"id":1},"ArrayIndex":0} | 03F4EB |
+-------------------------------+------------------------------------------------------------------------------------------+--------+

I am confused why the first two rows are duplicating,it should be only one entry for that. Out of two elements in xyz one is valid and one is invalid. But here the valid element is repeating two times. What might be the reason? How to resolve this ?

1
@kgalic- could you please check? - Antony
master M is reference data from other residence? - Jay Gong
yes M is master data stored in blob location - Antony

1 Answers

1
votes

Please see my query sql:

WITH INPUT1 AS (
    SELECT
        input.name as name,
        flatArrayElement as ABC
    FROM
        [YourInputAlias] as input
        CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement
)

SELECT INPUT1.ABC.ArrayValue.aaa FROM INPUT1
JOIN jayrefer on INPUT1.ABC.ArrayValue.aaa = jayrefer.item

jayrefer is your Master Reference:

enter image description here

input is your Input:

enter image description here

Output:

enter image description here


Just for summary,the issue is caused by the duplicate raws in the reference data so that the final results is duplicate.

The final sql is:

WITH INPUT AS (
    SELECT
    input1.id.dateTime AS ID,
    flatArrayElement as ABC
    FROM input1
    CROSS APPLY GetArrayElements(input1.xyz) AS flatArrayElement
)
SELECT * FROM INPUT I JOIN jayrefer M ON I.ABC.ArrayValue.aaa = M.aaa