5
votes

schema in bigquery field: items type: string

Value in table in items field is stored as string {"data": [{"id": "1234", "plan": {"sub_id": "567", "metadata": {"currentlySelling": "true", "custom_attributes": "{\"shipping\": true,\"productLimit\":10}", "Features": "[\"10 products\", \"Online support\"]"}, "name": "Personal", "object": "plan"}, "quantity": 1}], "has_more": false}

Two Questions 1) How can i query within array eg: where shipping is true or where one of the features is "Online support" 2) The reason I had to store the data as string because "custom_attributes" value can change. Is there a better way to store data in bigquery when value of one of the nested key can change?

1
you should clarify the actual schema of your table! also - what have you tried so far? Please edit your question to show a Minimal, Complete, and Verifiable example of the code that you are having problems with, then we can try to help with the specific problem. You can also read How to Ask.Mikhail Berlyant

1 Answers

7
votes

Your query would be something like this:

#standardSQL
SELECT game
FROM YourTable
WHERE EXISTS (SELECT 1 FROM UNNEST(participant) WHERE name = 'sam');

This returns all of the games where 'sam' was a participant. Here is a self-contained example:

#standardSQL
WITH YourTable AS (
  SELECT 'A' AS game, ARRAY<STRUCT<name STRING, age INT64>>[('sam', 12), ('tony', 12), ('julia', 12)] AS participant UNION ALL
  SELECT 'B', ARRAY<STRUCT<name STRING, age INT64>>[('sam', 12), ('max', 12), ('jacob', 12)] UNION ALL
  SELECT 'C', ARRAY<STRUCT<name STRING, age INT64>>[('sam', 12), ('max', 12), ('julia', 12)]
)
SELECT game
FROM YourTable
WHERE EXISTS (SELECT 1 FROM UNNEST(participant) WHERE name = 'sam');

If you wanted to pivot the data to have a column for each participant, you could use a query like this:

#standardSQL
CREATE TEMP FUNCTION WasParticipant(
    p_name STRING, participant ARRAY<STRUCT<name STRING, age INT64>>) AS (
  EXISTS(SELECT 1 FROM UNNEST(participant) WHERE name = p_name)
);

WITH YourTable AS (
  SELECT 'A' AS game, ARRAY<STRUCT<name STRING, age INT64>>[('sam', 12), ('tony', 12), ('julia', 12)] AS participant UNION ALL
  SELECT 'B', ARRAY<STRUCT<name STRING, age INT64>>[('sam', 12), ('max', 12), ('jacob', 12)] UNION ALL
  SELECT 'C', ARRAY<STRUCT<name STRING, age INT64>>[('sam', 12), ('max', 12), ('julia', 12)]
)
SELECT
  ARRAY_AGG(IF(WasParticipant('sam', participant), game, NULL) IGNORE NULLS) AS sams_games,
  ARRAY_AGG(IF(WasParticipant('tony', participant), game, NULL) IGNORE NULLS) AS tonys_games,
  ARRAY_AGG(IF(WasParticipant('julia', participant), game, NULL) IGNORE NULLS) AS julias_games,
  ARRAY_AGG(IF(WasParticipant('max', participant), game, NULL) IGNORE NULLS) AS maxs_games
FROM YourTable;

This returns an array with games played for each participant.