0
votes

I am trying to insert the below json data into a bigquery table using the Nifi processor PutBigQueryBatch 1.9.2

{    
"Revenue_Label": "Dining Room",     
"StoreName": "STORE A",     
"Revenue_Id": "1",     
"Alteration_Flag": "False",     
"Order_Mode_Label": "DriveThru",     
"checkClosetime": "2019-10-24T13:43:19+13:00",     
"Alterations": [        {            
"Alteration_Product_Code": "211136",             
"Alteration_Product_Net_Amount": 0.0,             
"Altered_Product_Code": "211135",             
"Alteration_Product_Amount": 0.0,             
"Altered_Product_Name": "Burger",             
"Alteration_Product_Name": "Add Sauce",             
"Alteration_Product_Qty": 1.0        }    ],     
"StoreId": "1234",    
 "dob": "20191024",     
 "Order_Mode_Id": "3",     
 "checknumber": "54321"}

However, I keep on getting the below error

PutBigQueryBatch:The RECORD field must have at least one sub-field:java.lang.IllegalArgumentException:The RECORD field must have at least one sub-field

Within the properties of the PutBigQueryBatch I have used the below schema definition that matches the BigQuery table definition in order, mode, name and type

[
  {"mode": "NULLABLE","name": "Revenue_Label","type": "STRING"},
  {"mode": "NULLABLE","name": "StoreName","type": "STRING"},
  {"mode": "NULLABLE","name": "Revenue_Id", "type": "STRING" },
  {"mode": "NULLABLE","name": "Alteration_Flag","type": "STRING"},
  {"mode": "NULLABLE","name": "Order_Mode_Label","type": "STRING"},
  {"mode": "NULLABLE","name": "checkClosetime","type": "TIMESTAMP" },
  {"mode": "REPEATED",    
   "name": "Alterations",    
   "type": "RECORD",    
     "fields": [      
                {"mode": "NULLABLE","name": "Alteration_Product_Code", "type": "STRING" },
                {"mode": "NULLABLE", "name": "Alteration_Product_Net_Amount", "type": "FLOAT" },
                {"mode": "NULLABLE", "name": "Altered_Product_Code","type": "STRING" },
                {"mode": "NULLABLE", "name": "Alteration_Product_Amount",  "type": "FLOAT" },
                {"mode": "NULLABLE", "name": "Altered_Product_Name", "type": "STRING" },
                {"mode": "NULLABLE", "name": "Alteration_Product_Name",  "type": "STRING" },
                {"mode": "NULLABLE","name": "Alteration_Product_Qty",   "type": "FLOAT"      } 
                ]  
    },    
  {"mode": "NULLABLE","name": "StoreId","type": "STRING"},
  {"mode": "NULLABLE", "name": "dob","type": "STRING"},
  {"mode": "NULLABLE","name": "Order_Mode_Id","type": "STRING"},
  {"mode": "NULLABLE","name": "checknumber","type": "STRING" }
 ]

What I have tried:

  1. Removed all the white spaces and carriage return characters from the json flowflie
  2. Tried the array at the start and end of the json file and matched the BigQuery table to this.
  3. Changed the Names of the items to not include underscores.
  4. Built the insert one column at a time and only fails once the array is present.
  5. Used the schema definition generated from Google Cloud BigQuery directly when using the utility "bq show --schema --format=prettyjson.
  6. Defined the mode for each of the array items to REQUIRED instead of NULLABLE.
  7. Removed all the white spaces and carriage return characters from the schema definition in the properties of the PutBigQueryBatch process in Nifi

All resulted in getting the same error as stated above

Note Elsewhere in my nifi process I successfully use the PutBigQueryBatch process and update the table. The difference being that json does not have an array present.

Does anyone have any ideas on how to fix the issue?

2
As minimum Alterations array not closed.daggett
HI daggett - thank you for your reply. I am a bit confused by what you mean "Alterations" Array not closed. Is it the schema definition or the json file that needs to be closed? Are you able to provide an example of what it should be? Thank you TimTim Manger
@TimManger Didn't you find a solution to this issue? I have run into a similar issue. I can load the schema json manually to generate the table in GCP console but if I try to do it via the PutBigQueryBatch it fails on that same RECORD part of the schema. And when the table is manually created all records can be written to it without any issue.Aklys
Hi Aklys - I had to create a work around. What I did was create a topic and subscription in GCP then use the PublishGCPubSub process in nifi to pass the json flowfile to PubSub. From there I wrote a Cloud Function that was triggered when the jason file arrived in the queue topic and inserted the json file into the table in bigquery.Tim Manger

2 Answers

0
votes

From the documentation, your original definition looks correct. Somehow the white spaces or tabs could make an error, so try it with more formatted one.

Pretty:

{ 
    "Revenue_Label":"Dining Room",
    "StoreName":"STORE A",
    "Revenue_Id":"1",
    "Alteration_Flag":"False",
    "Order_Mode_Label":"DriveThru",
    "checkClosetime":"2019-10-24T13:43:19+13:00",
    "Alterations":[ 
        { 
            "Alteration_Product_Code":"211136",
            "Alteration_Product_Net_Amount":0.0,
            "Altered_Product_Code":"211135",
            "Alteration_Product_Amount":0.0,
            "Altered_Product_Name":"Burger",
            "Alteration_Product_Name":"Add Sauce",
            "Alteration_Product_Qty":1.0
        }
    ],
    "StoreId":"1234",
    "dob":"20191024",
    "Order_Mode_Id":"3",
    "checknumber":"54321"
}

Compact:

{"Revenue_Label":"Dining Room","StoreName":"STORE A","Revenue_Id":"1","Alteration_Flag":"False","Order_Mode_Label":"DriveThru","checkClosetime":"2019-10-24T13:43:19+13:00","Alterations":[{"Alteration_Product_Code":"211136","Alteration_Product_Net_Amount":0.0,"Altered_Product_Code":"211135","Alteration_Product_Amount":0.0,"Altered_Product_Name":"Burger","Alteration_Product_Name":"Add Sauce","Alteration_Product_Qty":1.0}],"StoreId":"1234","dob":"20191024","Order_Mode_Id":"3","checknumber":"54321"}
0
votes

Based on your post I can see that you are defining the REPEATED field as Altertions within your schema definition; however, in the JSON you are using to upload your data, this field appears as Alterations