0
votes

I am trying to execute a hive job in google dataproc using following gcloud command:

gcloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --properties=[bucket1=abcd]

gcloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --params=[bucket1=abcd]

But none of the 2 above commands is able to set 'bucket1' variable to 'x' variable .

The hive script is as follows:

set x=${bucket1};
set x;
drop table T1;
create external table T1( column1 bigint, column2 float, column3 int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 'gs://${hiveconf:x}/output/prod';

But the variable 'x' is unable to take 'bucket1' variable which i passed in the gcloud command.

How do i do it? Please suggest

1

1 Answers

3
votes

Both examples should work with minor tweaks.

  • In cloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --properties bucket1=abcd, you can access variable as ${bucket1}

  • In gcloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --params bucket1=abcd, you can access variable as ${hivevar:bucket1}

Easy way to test this, is to submit a script like this to dump all variables:

gcloud dataproc jobs submit hive --cluster msm-test-cluster -e "set;" --properties foo=bar --params bar=baz

The output should contain:

| foo=bar    
| hivevar:bar=baz

Related question: How to set variables in HIVE scripts