0
votes
$insertResponse = $bqTable->insertRows($insertRows);

      if ($insertResponse->isSuccessful()) {
            return true;
      } else {
            foreach ($insertResponse->failedRows() as $row) {
                foreach ($row['errors'] as $error) {
                    Log::error('Streaming to BigQuery Error: ' . $error['reason'] . ' ' . $error['message']);
                }
            }
            return false;
      }

I used the above code (copied from the php client sample codes).

Basically, what it does is. If the streaming successful, I will return true, and if the streaming failed, I will return false.

I have 524845 rows to insert. To avoid the over size error, for each 1000 rows, I called the above stream statement. And then for the last 845 rows, I called the stream statement again.

if the streaming is successful (return true), I will continue to stream next 1000 rows. If the streaming fails, then I will stop the full streaming process.

I found that bigquery streaming is not stable. In my tests, most times, I had all the 534845 rows streamed into the table. But once a while, I lost some rows. Such as one time I only had 522845 rows streamed. No error reported/logged.

Due to I stream 1000 rows each time, it seems two of my stream activities failed, I lost 2000 rows. But there is no error report, also if it reports error, my code will stop.

Please advise what should I do next to debug this BigQuery Streaming issue.

1
Due to there are no streaming errors logged. I added codes to log the successful streaming activities. Next time, if I see the lost data cases, I should be able to dig out the successful streaming logs to see if there are more info I can find to help me debug or to send to Google Cloud support team. - searain

1 Answers

1
votes

Is an insertId being provided while inserting rows? If so, is it possible the insertIds may be duplicated? It could cause BigQuery to discard what it believes to potentially be duplicate rows.