Currently I use Parquet files to import data into BigQuery (GIS). One of the Parquet files holds geometry column, representing geometry data as WKT string (MultiPolygon
) and I want to import that column as GEOGRAPHY
type.
Q1. Documentation mentions that automatic conversion from WKT (string) to GEOGRAPHY
is not supported, so how can I overcome this issue?
I want to avoid using CSV files, and providing schema definitions manually.
However, even if I create empty table first, and only want to append new Parquet file, it's also not working:
Provided Schema does not match Table <org>-internal:test.<table>. Field geom has changed type from GEOGRAPHY to STRING.
. (It's same with Avro)
Q2. Is there an option to somehow force casting Parquet's column holding WKT strings to GEOGRAPHY
type? I don't want to load temporary table first and then run a query, that does all the casting, to load the target table.
Update:
When using python client, I'm able to specify the schema manually for the Parquet file, that I want to import. After loading is completed, all values in GEOGRAPHY
column has value INVALID
. The same code works fine with CSV.
from google.cloud import bigquery
client = bigquery.Client(project='<project>')
table_ref = client.dataset('test').table('geometry')
job_config = bigquery.LoadJobConfig()
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
job_config.schema = [
bigquery.SchemaField('id', 'INTEGER'),
bigquery.SchemaField('geom', 'GEOGRAPHY'),
]
# CSV works fine!
# uri = 'gs://<bucket>/multipoly_sample.csv'
# job_config.source_format = bigquery.SourceFormat.CSV
# With Parquet, values in geom column are all "INVALID"
uri = 'gs://<bucket>/multipoly_sample.parquet'
job_config.source_format = bigquery.SourceFormat.PARQUET
load_job = client.load_table_from_uri(
uri,
table_ref,
job_config=job_config)
load_job.result()