5
votes

I want to store an array of string values in a single column in BigQuery. In order to do that I need to first create a table with the proper schema. The schema identifier for the array is ARRAY<T> where T is the data type.

For example: T replaced by INT64 as ARRAY<INT64> enables the storage of 64-bit integer array in that column. How should I get this same effect but for storing string values? I have already tried STRING, VARCHAR and CHAR.

Just mention: I am using latest google-cloud python package

Documentation Reference for BigQuery Schema

2

2 Answers

16
votes

In order to add a column with array data, you need to define that column's mode as REPEATED. Hence a sample schema ends up being:

{
  'name': 'array_of_strings',
  'type': 'STRING',
  'mode': 'REPEATED'
},{
  'name': 'array_of_floats',
  'type': 'FLOAT',
  'mode': 'REPEATED'
}

This makes the field hold array values.

NOTE: You should be aware that if this is the schema of the table then you can not use CSV import functionality of big query since this is a limitation of CSV file format. You will need to use either json or avro formats.

Reference to the GitHub issue

4
votes

In order to do that I need to first create a table with the proper schema.

just run below in Web UI with new destination table - to create needed schema

#standardSQL
SELECT ARRAY<STRING>[] AS array_of_strings