0
votes

Am testing out Google BigQuery to replace traditional databases we have used in the past (PostgreSQL / MySQL) but have found the performance to be extremely (and oddly) slow.

Uploaded a ~6mb dataset (~44,000 rows) to test.

Tried to run a few simple queries:

SELECT Sub_Category, COUNT(*) AS COUNT FROM rnd-projects-247203.test.data GROUP BY Sub_Category

Can someone help explain to me why such a simple query on a small dataset takes over 20 seconds to run? Have I done something wrong in the setup / need to do something differently?

Execution details screenshot

Thanks!

Added: Execution details expanded under S00

Data is sourced and loaded in from a Google Drive .csv file

Execution details screenshot - expanded

1
Could you expand on the S00 step, which is where nearly all of the time is spent?. I suspect this might have something to do with storage location and data egression, but that might help us confirm it. - Francesco Galletta
As Francesco says, this speed doesn't make sense - unless you can show us how the data was loaded. - Felipe Hoffa
Could you provide more details on where the data is located and how the data was loaded ? - Alexandre Moraes
Hi guys, thanks for helping out - I have added a new screenshot. The data is located in Google Drive as a .csv file and I have manually added this via the Google BigQuery UI (ie, Create Table > Load data > From Google Drive) - Howard Gu
Everything seems ok, can you try to run a query against any bigquery public data set? And tell me about the execution time? - Alexandre Moraes

1 Answers

1
votes

BigQuery is a petabyte scale data warehouse. It's best complementary to a traditional database. It should not be considered in place of MySql/Postgres.

BigQuery is really fast for large scale queries, like 3-10 seconds for terabyte/petabyte scale. For small queries the run time is around 1-2 seconds as well. Anyway is not 20 miliseconds as on transactional databases.