3
votes

I am using the starschema JDBC driver for Google BigQuery in Pentaho PDI:

http://code.google.com/p/starschema-bigquery-jdbc/

My query through the BigQuery Web Console returns 129,993 rows, but when I execute the same query through the JDBC driver it only returns 100,000 rows. Is there some kind of option or limit that I am not aware of?

2

2 Answers

1
votes

The StarSchema code looks like it is only returning the first page of results.

The code here here should be updated to get the rest of the results. It should look something like:

public static GetQueryResultsResponse getQueryResults(Bigquery bigquery,
        String projectId, Job completedJob) throws IOException {        
    GetQueryResultsResponse queryResult = bigquery.jobs()
            .getQueryResults(projectId,
                    completedJob.getJobReference().getJobId()).execute();
    while(queryResult.getTotalRows() > queryResult.getRows().size()) {
        queryResult.getRows().addAll(
            bigquery.jobs()
                .getQueryResults(projectId,
                        completedJob.getJobReference().getJobId())
                .setStartIndex(queryResult.getRows().size())
                .execute()
                .getRows());            
    }
    return queryResult;
}
1
votes

Modified the code based on Jordan's answer, the solution looks like this:

    public static GetQueryResultsResponse getQueryResults(Bigquery bigquery,
        String projectId, Job completedJob) throws IOException {
    GetQueryResultsResponse queryResult = bigquery.jobs()
            .getQueryResults(projectId,
                    completedJob.getJobReference().getJobId()).execute();
    long totalRows = queryResult.getTotalRows().longValue();
    if(totalRows == 0){ 
//if we don't have results we'll get a nullPointerException on the queryResult.getRows().size()
        return queryResult;
    }
    while( totalRows  > (long)queryResult.getRows().size() ) {
        queryResult.getRows().addAll(
            bigquery.jobs()
                .getQueryResults(projectId,
                        completedJob.getJobReference().getJobId())
                .setStartIndex(BigInteger.valueOf((long)queryResult.getRows().size()) )
                .execute()
                .getRows());           
    }
    return queryResult;
}

This should solve the problem. Also uploaded the new version to google code, named bqjdbc-1.3.1.jar