3
votes

In Big Query, I was having issues moving a Pipe(|)-delimited file into Google Big Query. To make this work, I decided to bring in each line whole as a single string, and then perform the SPLIT function to get the fields I want in the data types I need it in.

In this case, I have this SELECT statement:

Select
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(0)] as INT64) AS FIELD1
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(1)] AS FIELD2
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(2)] AS FIELD3
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(3)] AS FIELD4
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(4)] AS FIELD5
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(5)] AS FIELD6
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(6)] as timestamp) AS FIELD7
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(7)] AS FIELD8
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(8)] AS FIELD9
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(9)] AS FIELD10
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(10)] AS FIELD11
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(11)] AS FIELD12
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(12)] AS FIELD13
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(13)] AS FIELD14
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(14)] AS FIELD15
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(15)] AS FIELD16
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(16)] AS FIELD17
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(17)] AS FIELD18
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(18)] AS FIELD19
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(19)] AS FIELD20
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(20)] AS FIELD21
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(21)] AS FIELD22
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(22)] AS FIELD23
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(23)] AS FIELD24
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(24)] AS FIELD25
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(25)] AS FIELD26
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(26)] AS FIELD27
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(27)] as INT64) AS FIELD28
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(28)] AS FIELD29
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(29)] AS FIELD30
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(30)] AS FIELD31
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(31)] as timestamp) AS FIELD32
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(32)] as INT64) AS FIELD33
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(33)] as INT64) AS FIELD34
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(34)] as timestamp) AS FIELD35
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(35)] as INT64) AS FIELD36
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(36)] as INT64) AS FIELD37
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(37)] as timestamp) AS FIELD38
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(38)] as numeric) AS FIELD39
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(39)] as INT64) AS FIELD40
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(40)] as INT64) AS FIELD41
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(41)] AS FIELD42
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(42)] AS FIELD43
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(43)] as INT64) AS FIELD44
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(44)] as numeric) AS FIELD45
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(45)] AS FIELD46
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(46)] as timestamp) AS FIELD47
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(47)] AS FIELD48
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(48)] as INT64) AS FIELD49
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(49)] as timestamp) AS FIELD50
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(50)] as INT64) AS FIELD51
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(51)] as INT64) AS FIELD52
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(52)] as INT64) AS FIELD53
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(53)] as INT64) AS FIELD54
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(54)] as timestamp) AS FIELD55
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(55)] as INT64) AS FIELD56
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(56)] as timestamp) AS FIELD57
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(57)] AS FIELD58
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(58)] AS FIELD59
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(59)] AS FIELD60
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(60)] AS FIELD61
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(61)] AS FIELD62
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(62)] AS FIELD63
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(63)] as FLOAT64) AS FIELD64
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(64)] as FLOAT64) AS FIELD65
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(65)] AS FIELD66
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(66)] as INT64) AS FIELD67
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(67)] as INT64) AS FIELD68
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(68)] as INT64) AS FIELD69
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(69)] AS FIELD70
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(70)] AS FIELD71
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(71)] AS FIELD72
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(72)] as timestamp) AS FIELD73
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(73)] as timestamp) AS FIELD74
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(74)] as timestamp) AS FIELD75
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(75)] as INT64) AS FIELD76
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(76)] as INT64) AS FIELD77
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(77)] as INT64) AS FIELD78
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(78)] as timestamp) AS FIELD79
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(79)] as timestamp) AS FIELD80
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(80)] as timestamp) AS FIELD81
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(81)] as timestamp) AS FIELD82
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(82)] AS FIELD83
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(83)] as FLOAT64) AS FIELD84
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(84)] as FLOAT64) AS FIELD85
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(85)] as FLOAT64) AS FIELD86
from `db.ds.tbl`

GBQ states this query is structurally sound. but shows this response:

Array index 74 is out of bounds (overflow) google big query

How do I respond to this? Thanks

just to be clear. the delimiter is |"||"|. for example:

6|"||"|Demo (60 Minutes) |"||"|demo60min|"||"|2017-09-13 16:30:00|"||"|2017-09-13 15:30:00

Thanks

1
There aren't that many elements in the array after splitting it. You need to fix your query or fix the data.Elliott Brossard

1 Answers

11
votes

Array index 74 is out of bounds (overflow) google big query
How do I respond to this?

Instead of using [OFFSET(NN)] in line like below

SPLIT(RECORDSTRING, '|"||"|')[OFFSET(NN)] 

you should use [SAFE_OFFSET(NN)] as in below example

SPLIT(RECORDSTRING, '|"||"|')[SAFE_OFFSET(NN)]    

Extra update

As you can see - your query is quite bulky and hard to read and maintain

Below is simplified example of how to transform your query into compact one

#standardSQL
SELECT 
  SAFE_CAST(item[SAFE_OFFSET(0)] AS INT64) AS FIELD1,
  item[SAFE_OFFSET(1)] AS FIELD2,
  item[SAFE_OFFSET(2)] AS FIELD3,
  item[SAFE_OFFSET(3)] AS FIELD4,
  item[SAFE_OFFSET(4)] AS FIELD5,
  item[SAFE_OFFSET(5)] AS FIELD6,
  SAFE_CAST(item[SAFE_OFFSET(6)] AS TIMESTAMP) AS FIELD7
FROM `project.dataset.table` t,
UNNEST([STRUCT<item ARRAY<STRING>>(SPLIT(RECORDSTRING, '|"||"|'))])

You can test it with sample data from your question as in example below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '''6|"||"|Demo (60 Minutes) |"||"|demo60min|"||"|2017-09-13 16:30:00|"||"|2017-09-13 15:30:00''' AS RECORDSTRING
)
SELECT 
  SAFE_CAST(item[SAFE_OFFSET(0)] AS INT64) AS FIELD1,
  item[SAFE_OFFSET(1)] AS FIELD2,
  item[SAFE_OFFSET(2)] AS FIELD3,
  item[SAFE_OFFSET(3)] AS FIELD4,
  item[SAFE_OFFSET(4)] AS FIELD5,
  item[SAFE_OFFSET(5)] AS FIELD6,
  SAFE_CAST(item[SAFE_OFFSET(6)] AS TIMESTAMP) AS FIELD7
FROM `project.dataset.table` t,
UNNEST([STRUCT<item ARRAY<STRING>>(SPLIT(RECORDSTRING, '|"||"|'))])

with result

Row FIELD1  FIELD2              FIELD3      FIELD4              FIELD5                 FIELD6      FIELD7    
1   6       Demo (60 Minutes)   demo60min   2017-09-13 16:30:00 2017-09-13 15:30:00    null        null