1
votes

We are working with Google BigQuery (using Java) for one of our cloud solution and facing lot of issues in development. Our observations and issues as follows -

  1. We are using Query Jobs (Example: jobs().insert()/jobs().query() method first and then tablesdata().list() for data) for data retrieval. The Job execution taking 2-3 seconds (we had data in MBs only right now). We looked into sample codes on code.google.com and github.com and tried to implement them. However, we are not able to achieve fast execution than 2-3 seconds. What is the fast way to retrieve data from BigQuery tables? Is there a way to improvise Job execution speed? If yes, Can you provide links for sample codes?
  2. In our screens, we need to fetch data from different tables (different queries) and display them. So, we inserted multiple query jobs and total execution time getting summed-up (Example: if we had two jobs (i.e. two queries), it takes 6-7 seconds). In Google documentation it has been mentioned that, we can run concurrent Jobs. Is there any sample code available for this?

Waiting for your valuable responses.

2

2 Answers

1
votes

1) Big Query is a highly scalable database, before being a "super fast" database. It's designed to process HUGE amount of data distributing the processing among several different machines using a technique named Dremel. Because it's designed to use several machines and parallel processing, you should expect to have super-scalability with a good performance.

2) BigQuery is an asset when you want to analyze billions of rows.

For example: analyzing all the wikipedia revisions in 5-10 seconds isn't bad, is it? But even a much smaller table would take about the same time, even if has 10k rows.

3) Under this size, you'll be better off using more traditional data storage solutions such as Cloud SQL or the App Engine Datastore. If you want to keep SQL capability, Cloud SQL is the best guess.

Sybase IQ is often installed in a single database and it doesn't use Dremel. That said, it's going to be faster than Big Query in many scenarios...as designed.

4) Certainly the performance differ from a dedicated environment. You get your dedicated environment for 20K$ a month.

2
votes
  1. Query of cached results can be much faster, if you can run the query independently. The following query will run faster.
  2. Check that the bottle-neck is not related to network\ paginating\ page rendering\ etc. you can do it by trying executing only the 2nd step.
  3. Parallel jobs might be queued on BQ end based on their current load.

My recommendation would be to separate the query from presentation. Run the BQ queries, retrieve the "Small size" data to a fast access data store (flat file, Cache, Cloud SQL, etc.) and present it from there. As Pentium10 says, BQ is excellent for huge datas (and returns results faster and cheaper than any other comparable solution). If you are looking for a back-end of a fast reporting visualization tool, I am afraid that BQ might not be your solution.