2
votes

My use case includes creating an external table in Bigquery using Pyspark code. The data source is Google cloud storage bucket where JSON data is sitting. I am reading the JSON data into a data frame and want to create an external Bigquery table. As of now, the table is getting created but it is not an external one.

df_view.write\
    .format("com.google.cloud.spark.bigquery")\
    .option('table', 'xyz-abc-abc:xyz_zone.test_table_yyyy')\
    .option("temporaryGcsBucket","abcd-xml-abc-warehouse")\
    .save(mode='append',path='gs://xxxxxxxxx/')

P.S. - I am using spark-bigquery connector to achieve my goal.

Please let me know in case anyone has faced the same issue.

1

1 Answers

1
votes

At the moment the spark-bigquery-connector does not support writing to an external table. Please create an issue and we will try to add it soon.

You can of course do it in two steps:

  • Write the JSON files to GCS.
  • Use the BigQuery API in order to create the external table.