2
votes

I have an unpartitionned table in BigQuery called "rawdata". That table is becoming quite big, and I would like to partition it. I did not find any way to partition the original table, but according to https://cloud.google.com/bigquery/docs/creating-partitioned-tables#restating_data_in_a_partition, I can run a command line that will push from unpartitionned "rawdata" into a partitionned table using a query, but only for a specific day/partition.

My instinct was to use the C# API (we already append data through that) to automate the process of doing the bq query --replace restating from the unpartitionned table, but there doesn't seem to be anything that can do that in the C# code. Do you guys have any recommendation on how to proceed forward? Should I wrap the bq command line execution instead of using Google API?

2

2 Answers

0
votes

I am not certain which portion of the API you are referring to, but it looks like you are referring to the Query API here: https://cloud.google.com/bigquery/docs/reference/v2/jobs/query#examples, which won't allow you to pass in a destination table and truncate/append to it.

The Insert API here: https://cloud.google.com/bigquery/docs/reference/v2/jobs/insert#examples can be used to do what you like by filling in the Configuration.Query part here: https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.query

Specifically, you'd want to specify the 'configuration.query.destinationTable' field to be the table partition you want to populate, and the 'configuration.query.createDisposition' and 'configuration.query.writeDisposition' fields that match your requirements.

This is effectively what the shell client does passing the '--replace' and '--destination_table' parameters.

Finally, you might want to check this thread for cost considerations when you created a partitioned table from a non-partitioned one: Migrating from non-partitioned to Partitioned tables

0
votes

Based on Ahmed's comments I put together some c# code using the c# api client that can restate data for a partition. Our use case was to remove personal information after it's served it's use. The below code will set the second field to a blank value. In the query I noticed you couldn't use the your_table$20180114 syntax in the query that you would use in CLI tool so this is changed to a where clause to retrieve the original data from the corresponding partition.

var queryRestateJob = _sut.CreateQueryJob(@"SELECT field1, '' as field2, field3  FROM your_dataset.your_table WHERE _PARTITIONTIME = TIMESTAMP('2018-01-14')",
            new List<BigQueryParameter>(), new QueryOptions
            {
                CreateDisposition = CreateDisposition.CreateIfNeeded,
                WriteDisposition = WriteDisposition.WriteTruncate,
                DestinationTable = new TableReference
                {
                    DatasetId = "your_dataset",
                    ProjectId = "your_project",
                    TableId = "your_table$20180114"
                },
                AllowLargeResults = true,
                FlattenResults = false,
                ProjectId = "your_project"
            })
            .PollUntilCompleted();

            queryRestateJob = queryRestateJob.ThrowOnAnyError();