I would like to split a table with 120 million rows into multiple equally-sized tables in Google BigQuery. I found this answer to a question that asks about getting row numbers for a table. The example solution was:
SELECT
ROW_NUMBER() OVER() row_number,
contributor_username,
FROM (
SELECT contributor_username,
FROM [publicdata:samples.wikipedia]
GROUP BY contributor_username
)
which would allow you to get a row_number, and then you could create a table by manually selecting
WHERE row_number BETWEEN x AND y
This works for the table in the example, but using ROW_NUMBER()
yields a resources exceeded error for a table with 117MM rows.
I then tried the following idea:
SELECT field1, field2
FROM (
SELECT field1, field2, ntile(100) over (order by row_id ) as ntile
FROM (
SELECT hash(some_unique_key) as row_id, field1, field2
FROM table_with_177_MM_rows
)
)
WHERE ntile = 1
I would then need to run this for ntile=[1,...,100]. This also failed. Is there any way to split a large table with a unique key in GBQ into smaller tables?