1
votes

I have JSON data which is saved in BigQuery as a string.

{
   "event":{
      "action":"prohibitedSoftwareCheckResult",
      "clientTime":"2017-07-16T12:55:40.828Z",
      "clientTimeZone":"3",
      "serverTime":"2017-07-16T12:55:39.000Z",
      "processList":{
         "1":"outlook.exe",
         "2":"notepad.exe"
      }
   },
   "user":{
      "id":123456,

   }
}

I want to have a result set where each process will be in a different row. Something like:

UserID        ProcessName
-------------------------
123456        outlook.exe
123456        notepad.exe

I saw there is an option to query repeated data but the field needs to be RECORD type to my understanding.

Is it possible to convert to RECORD type "on the fly" in a subquery? (I can't change the source field to RECORD).

Or, is there a different way to return the desired result set?

1
"processList" here is not a repeated field but rather a record. For it to be a repeated field it should be an array like "processList: ["outlook.exe", "notepad.exe"]. As it is, I suppose getting your results would require knowing first all keys inside processList (such as "1", "2", and so on) - Willian Fuks
This is actually dynamic, but sequential (1,2,3..N). I can have different amount of processes for different jsons. Maybe I should somehow pivot (flatten) it and then un-pivot it? - NirKa

1 Answers

9
votes

This could be a possible work around for you:

SELECT
  user_id,
  processListValues
FROM(
  SELECT 
    JSON_EXTRACT_SCALAR(json_data, '$.user.id') user_id,
    REGEXP_EXTRACT_ALL(JSON_EXTRACT(json_data, '$.event.processList'), r':"([a-zA-Z0-9\.]+)"') processListValues
  FROM data
),
UNNEST(processListValues) processListValues

Using your JSON as example:

WITH data AS(
SELECT """{
   "event":{
      "action":"prohibitedSoftwareCheckResult",
      "clientTime":"2017-07-16T12:55:40.828Z",
      "clientTimeZone":"3",
      "serverTime":"2017-07-16T12:55:39.000Z",
      "processList":{
         "1":"outlook.exe",
         "2":"notepad.exe",
         "3":"outlo3245345okexe"
      }
   },
   "user":{
      "id":123456,

   }
}""" as json_data
)

SELECT
  user_id,
  processListValues
FROM(
  SELECT 
    JSON_EXTRACT_SCALAR(json_data, '$.user.id') user_id,
    REGEXP_EXTRACT_ALL(JSON_EXTRACT(json_data, '$.event.processList'), r':"([a-zA-Z0-9\.]+)"') processListValues
  FROM data
),
UNNEST(processListValues) processListValues

Results:

Row user_id processListValues    
1   123456  outlook.exe  
2   123456  notepad.exe  
3   123456  outlo3245345okexe