1
votes

I set up a Dataprep scheduled job flow copying and treating daily some csv and json files stored in a Cloud Storage bucket to Big Query tables. It was working fine, but since some days the job started copying in Big Query less rows than those contained in the csv and json files. I don't know if this is related but in the same time the process upstream changed also the content type of the files. The csvs switched from application/octet-stream to text/csv; charset=utf-8. The jsons from application/json to application/json; charset=utf-8. Can this change of content type be somehow related? Otherwise does anybody had some similar issues?

I created the same version of csv file (with 5 records) one with content type application/octet-stream the other text/csv; charset=utf-8. Then I created a simple Dataprep job just reading the csv files and converting some integer variable to test and exporting the end result to a Big Query table.

The flow treating the csv with application/octet-stream encoding exported 5 records to Big Query, as expected. The one treating the csv with text/csv; charset=utf-8, exported only 3 records, even if the data recipe in the Dataprep Transformer node shown 5 records.

Find here below my target Big Query schema:

CustomerID:STRING,
CustomerUniqueRef:STRING,
BranchID:STRING,
DateCreated:DATETIME,
CreatedBy:STRING,
PreviouslyBanked:STRING

My transformations in Dataprep are just converting CustomerID, CustomerUniqueRef, CreatedBy and PreviouslyBanked from INTEGER to STRING.

Find also my csv for test:

CustomerID,CustomerUniqueRef,BranchID,DateCreated,CreatedBy,PreviouslyBanked 43944,0004674956,004,2019-06-14T10:52:11,77,1 43945,0004674957,004,2019-06-14T10:59:32,77,0 43946,0004674958,004,2019-06-14T11:03:14,77,0 43947,0004674959,004,2019-06-14T11:06:23,77,0 43948,0004674960,004,2019-06-14T11:09:24,77,0

1
Hey Giorgio, can you please attach your BQ table schema & an example of some records you are trying to insert?Royzipuff
Hi Royzipuff, sure, I just edited to add Big Query schema and csv sample.Giorgio Rivero
Thanks, I have created a similar flow in Dataprep importing the CSV records you attached from GCS (as text/csv), and a corresponding table in BigQuery. All 5 rows seem to load successfully. Does your Dataprep job append / truncate / drop-create the destination table every time you run it?Royzipuff
Thanks! My Dataprep job creates a new Big Query table every time I run it. Like prefix_20190616_161227. That didn't change anyway.Giorgio Rivero
@Royzipuff text/csv; charset=utf-8 and text/csv are the same?Giorgio Rivero

1 Answers

1
votes

I finally found what the issue was. It was a matter of incorrect parametrization of the csv in Cloud Storage. Its content-type was text/csv; charset=utf-8 and content-encoding gzip. So actually the csv was compressed. Moving to content-type=text/csv and content-encoding=utf-8 solved the issue.