1
votes

I am trying to use Initialization actions using Dataproc (GCP).

A very short intro: We are running processes constantly on Dataproc clusters with GCP using Spark jobs to transform our data in many different ways.

I am trying the last few days to build an initialization action file (.sh) that will run a few Linux commands when the cluster is up. Everything works well except the last command (the one that supposes to run the Spark app engine & execute the job).

So I am now adding all the related information (My .sh file + the error log) and I would love to hear your suggestions while I still looking around the internet to figure it out. Thanks.

My .sh File:

#!/bin/bash
ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [ $ROLE = "Master" ]; then
    echo $ROLE
    sudo mkdir /opt/ads
    sudo mkdir /opt/ads/apps
    sudo gcloud compute scp --recurse 35613250742-compute@brainservice:/opt/ads/apps/ /opt/ads/apps/ --zone=europe-west2-c --internal-ip
    sudo sh SparkBuildPanelBatch.sh /opt/ads/apps/apps/SparkBuildPanelBatch_Latest/DeployPack_2203.txt 20201201 20201210 20201210
fi

My error output log:

Master
WARNING: The public SSH key file for gcloud does not exist.
WARNING: The private SSH key file for gcloud does not exist.
WARNING: You do not have an SSH key for gcloud.
WARNING: SSH keygen will be executed to generate a key.
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/google_compute_engine.
Your public key has been saved in /root/.ssh/google_compute_engine.pub.
The key fingerprint is:
SHA256:QEN6xC2DB8+lUmFPpvxilMQmPYMBXSPkaTEgNuvqP8c root@analytics-m
The key's randomart image is:
+---[RSA 2048]----+
|.oo=B@%o+        |
|..o.=/@@.        |
| .  *+@=.        |
|.  . + o         |
| .    o S        |
|.    . .         |
|.   .            |
|.  . E           |
| ...o            |
+----[SHA256]-----+
Updating project ssh metadata...
.............................................................................................................................................................Updated [https://www.googleapis.com/compute/v1/projects/supersal].
..done.
Waiting for SSH key to propagate.
Warning: Permanently added 'compute.4114551798115890446' (ECDSA) to the list of known hosts.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100    19  100    19    0     0   3022      0 --:--:-- --:--:-- --:--:--  3166
scp: /opt/ads/apps/Profile360_Latest/serviceLog/Profile360Service.out: Permission denied
ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
sh: 0: Can't open SparkBuildPanelBatch.sh

I am not worried about that line:'scp: /opt/ads/apps/Profile360_Latest/serviceLog/Profile360Service.out: Permission denied ' because it out of use and will be fixed later. I don't understand why it cant open the .sh file.

1
You created /opt/ads/apps/apps locally, not on the remote server. - tripleee
Hi @tripleee, all the commands are referring to the 'Master' node as I wanted. Not in the local server. I can also see that the directories are built well and the folder I need to copy from another cp is also copied well to the remote server as needed. What I am missing? - Yoav Barzilai
It's not really clear which part exactly is producing the error message but it looks like /opt/ads/apps/Profile360_Latest/serviceLog or perhaps its parent directory already exists on your local server, with permissions which do not allow you to write there. I don't think scp should be able to create a directory it can't write to. Try to reduce this to a minimal reproducible example. - tripleee
Tip: you don't need sudo in init actions, it is running as root. - Dagang

1 Answers

0
votes

I figured it out I just added a line of command 'cd /opt/ads/apps/CurrectFolder' between the last two and it works lol. Thanks, @tripleee for your time.