1
votes

After I loading parquet file from Google storage into BigQuery table. The data on preview tap (inside Bigquery) different data from source originally. But the schema it's correct.

1
Nurma can you please share what's the expected result and also do a select from the table and not only use the preview tab to check the valuesTamir Klein
Hi Tamir, I did a select from table then the result shown the same value ( SELECT * FROM master-tangent-240211.Demo_2019.Demo_parquet LIMIT 1000). Please help me. ThanksNurma Sbl
Please check bigquery document and verify your data is not compress or you are excatlly following all the described guidelines cloud.google.com/bigquery/docs/…. If this is still the case please provide a clear text example and how you convert it to Parquet to make it easier to further assist you. Please also add expected values vs recicied item so it will be clear to see what is the gaps your are facingTamir Klein
Hi Tamir, I loading parquet file into BigQuery following command: bq --location=asia-southest1 load --source_format=PARQUET Demo_2019.Demo_01 gs://cdh-bucket/warehouse/parquet_employee/ea4b68c5d20bbc90-bfec9bfd00000000_333529865_data.0.parq. after loading successfully I found the data isn't correct from the originally. The result show Row (1) id (MDAx) name (bWVI). So I loading the other parquet file by this command bq --location=asia-southest1 load --source_format=PARQUET Demo_2019.Demo_01 gs://cdh-bucket/warehouse/sample_parquet/userdata1.parquet. The result data was correctly.Nurma Sbl
Hi Nurma without the original file I can't do much. I suggest you approach BigQuery support directly to help you on this matterTamir Klein

1 Answers

0
votes

I would think that if the schema is correct, the loaded data must be correct. My best guessing is that the data in the parquet file is masked and you would need a function to unmask it.

To verify if the parquet contains the same data loaded to BQ, you can list a couple of rows in the original parquet file by running the parquet tools:

$ hadoop jar parquet-tools-1.9.0.jar head file:///ea4b68c5d20bbc90-bfec9bfd00000000_333529865_data.0.parq