I set up a Dataprep scheduled job flow copying and treating daily some csv and json files stored in a Cloud Storage bucket to Big Query tables. It was working fine, but since some days the job started copying in Big Query less rows than those contained in the csv and json files. I don't know if this is related but in the same time the process upstream changed also the content type of the files. The csvs switched from application/octet-stream to text/csv; charset=utf-8. The jsons from application/json to application/json; charset=utf-8. Can this change of content type be somehow related? Otherwise does anybody had some similar issues?
I created the same version of csv file (with 5 records) one with content type application/octet-stream the other text/csv; charset=utf-8. Then I created a simple Dataprep job just reading the csv files and converting some integer variable to test and exporting the end result to a Big Query table.
The flow treating the csv with application/octet-stream encoding exported 5 records to Big Query, as expected. The one treating the csv with text/csv; charset=utf-8, exported only 3 records, even if the data recipe in the Dataprep Transformer node shown 5 records.
Find here below my target Big Query schema:
CustomerID:STRING,
CustomerUniqueRef:STRING,
BranchID:STRING,
DateCreated:DATETIME,
CreatedBy:STRING,
PreviouslyBanked:STRING
My transformations in Dataprep are just converting CustomerID, CustomerUniqueRef, CreatedBy and PreviouslyBanked from INTEGER to STRING.
Find also my csv for test:
CustomerID,CustomerUniqueRef,BranchID,DateCreated,CreatedBy,PreviouslyBanked 43944,0004674956,004,2019-06-14T10:52:11,77,1 43945,0004674957,004,2019-06-14T10:59:32,77,0 43946,0004674958,004,2019-06-14T11:03:14,77,0 43947,0004674959,004,2019-06-14T11:06:23,77,0 43948,0004674960,004,2019-06-14T11:09:24,77,0