Im attempting to stream data from a kafka installation into BigQuery using Java based on Google samples. The data is JSON rows ~12K in length. I batching these into blocks of 500 (roughly 6Mb) and streaming them as:
InsertAllRequest.Builder builder = InsertAllRequest.newBuilder(tableId);
for (String record : bqStreamingPacket.getRecords()) {
Map<String, Object> mapObject = objectMapper.readValue(record.replaceAll("\\{,", "{"), new TypeReference<Map<String, Object>>() {});
// remove nulls
mapObject.values().removeIf(Objects::isNull);
// create an id for each row - use to retry / avoid duplication
builder.addRow(String.valueOf(System.nanoTime()), mapObject);
}
insertAllRequest = builder.build();
...
BigQueryOptions bigQueryOptions = BigQueryOptions.newBuilder().
setCredentials(Credentials.getAppCredentials()).build();
BigQuery bigQuery = bigQueryOptions.getService();
InsertAllResponse insertAllResponse = bigQuery.insertAll(insertAllRequest);
Im seeing insert times of 3-5 seconds for each call. Needless to say this makes BQ streaming less than useful. From their documents I was worried about hitting per-table insert quotas (Im streaming from Kafka at ~1M rows / min) but now Id be happy to deal with that problem.
All rows insert fine. No errors.
I must be doing something very wrong with this setup. Please advise.