0
votes

I'm looking for effective way to upload the following array to Big query table in this format :

Big query columns (example)

event_type: video_screen
event_label: click_on_screen
is_ready:false
time:202011231958
long:1
high:43
lenght:0

**

Array object

**

[["video_screen","click_on_screen","false","202011231958","1","43","0"],["buy","error","2","202011231807","1","6","0"],["sign_in","enter","user_details","202011231220","2","4","0"]]

I thought of several options but none of them seems to be The best practice.

Option A: Upload the following file to Google storage and then create table related to this bucket - not worked because of file format, Google Bigquery can't parse array from Google bucket.

Option B: Use by backend (node.js) to change the file structure to CSV and upload it directly to Bigquery - failed because of latency (the array is long, more than my example).

Option C: Use Google Appcript to get the array object and insert it to Bigquery - I didn't find a simple code for this, Google storage has no API connected to Appscript.

Someone deal with such a case and can share his solution? What is the best practice for this case? if you've code for this it will be great.

1
Option D, write the file in csv or another supported format directly. Then you can insert the rows into BQ easily - tehhowch
What's the size of the longest line in your file? - Pentium10
100K lines per file, but I process multiple files every 5 minutes. - idan
Lines per file is not problem. What's the longest line in your file? 1 single line how long is it? - Pentium10
6 columns, max 10. - idan

1 Answers

2
votes

Load the file from GCS to BigQuery into a table with 1 single string column. So you get 100K rows and one single column.

Essentially you will have a table that has a JSON in a string.

Use JSON_EXTRACT_ARRAY to process the JSON array into elements

then later extract each position into its coresponding variable/column and write it to a table

here is a demo:

with t as (
    select '[["video_screen","click_on_screen","false","202011231958","1","43","0"],["buy","error","2","202011231807","1","6","0"],["sign_in","enter","user_details","202011231220","2","4","0"]]' as s
),
elements as (
select e from t,unnest(JSON_EXTRACT_ARRAY(t.s)) e
)
select 
    json_extract_scalar(e,'$[0]') as event_type ,
    json_extract_scalar(e,'$[1]') as event_label,
from elements

the output is:

enter image description here