For our Near real time analytics, data will be streamed into pubsub and Apache beam dataflow pipeline will process by first writing into bigquery and then do the aggregate processing by reading again from bigquery then storing the aggregated results in Hbase for OLAP cube Computation.
Here is the sample ParDo function which is used to fetch record from bigquery
String eventInsertedQuery="Select count(*) as usercount from <tablename> where <condition>";
BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
QueryJobConfiguration queryConfig
=QueryJobConfiguration.newBuilder(eventInsertedQuery).build();
TableResult result = bigquery.query(queryConfig);
FieldValueList row = result.getValues().iterator().next();
LOG.info("rowCounttt {}",row.get("usercount").getStringValue());
bigquery.query is taking aroud ~4 seconds. Any suggestions to improve it? Since this is near real time analytics this time duration is not acceptable.