Google BigQuery - Parsing string data from a Bigquery table column

Question

I have a table A within a dataset in Bigquery. This table has multiple columns and one of the columns called hits_eventInfo_eventLabel has values like below:

{ID:AEEMEO,Score:8.990000;ID:SEAMCV,Score:8.990000;ID:HBLION;Property ID:DNSEAWH,Score:0.391670;ID:CP1853;ID:HI2367;ID:H25600;}

If you write this string out in a tabular form, it contains the following data:

**ID      |   Score**
AEEMEO  |   8.990000
SEAMCV  |   8.990000
HBLION  |    -
DNSEAWH |   0.391670
CP1853  |    -
HI2367  |    -
H25600  |    -

Some IDs have scores, some don't. I have multiple records with similar strings populated under the column hits_eventInfo_eventLabel within the table.

My question is how can I parse this string successfully WITHIN BIGQUERY so that I can get a list of property ids and their respective recommendation scores (if existing)? I would like to have the order in which the IDs appear in the string to be preserved after parsing this data.

Would really appreciate any info on this. Thanks in advance!

Mosha Pasumansky Mosha Pasumansky · Accepted Answer · 2015-08-14T00:53:48

I would use combination of SPLIT to separate into different rows and REGEXP_EXTRACT to separate into different columns, i.e.

select 
  regexp_extract(x, r'ID:([^,]*)') as id,
  regexp_extract(x, r'Score:([\d\.]*)') score from (
select split(x, ';') x from (
select 'ID:AEEMEO,Score:8.990000;ID:SEAMCV,Score:8.990000;ID:HBLION;Property ID:DNSEAWH,Score:0.391670;ID:CP1853;ID:HI2367;ID:H25600;' as x))

It produces the following result:

Row id      score    
1   AEEMEO  8.990000     
2   SEAMCV  8.990000     
3   HBLION  null     
4   DNSEAWH 0.391670     
5   CP1853  null     
6   HI2367  null     
7   H25600  null

Google BigQuery - Parsing string data from a Bigquery table column

2 Answers