I am trying to use Initialization actions using Dataproc (GCP).
A very short intro: We are running processes constantly on Dataproc clusters with GCP using Spark jobs to transform our data in many different ways.
I am trying the last few days to build an initialization action file (.sh) that will run a few Linux commands when the cluster is up. Everything works well except the last command (the one that supposes to run the Spark app engine & execute the job).
So I am now adding all the related information (My .sh file + the error log) and I would love to hear your suggestions while I still looking around the internet to figure it out. Thanks.
My .sh File:
#!/bin/bash
ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [ $ROLE = "Master" ]; then
echo $ROLE
sudo mkdir /opt/ads
sudo mkdir /opt/ads/apps
sudo gcloud compute scp --recurse 35613250742-compute@brainservice:/opt/ads/apps/ /opt/ads/apps/ --zone=europe-west2-c --internal-ip
sudo sh SparkBuildPanelBatch.sh /opt/ads/apps/apps/SparkBuildPanelBatch_Latest/DeployPack_2203.txt 20201201 20201210 20201210
fi
My error output log:
Master
WARNING: The public SSH key file for gcloud does not exist.
WARNING: The private SSH key file for gcloud does not exist.
WARNING: You do not have an SSH key for gcloud.
WARNING: SSH keygen will be executed to generate a key.
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/google_compute_engine.
Your public key has been saved in /root/.ssh/google_compute_engine.pub.
The key fingerprint is:
SHA256:QEN6xC2DB8+lUmFPpvxilMQmPYMBXSPkaTEgNuvqP8c root@analytics-m
The key's randomart image is:
+---[RSA 2048]----+
|.oo=B@%o+ |
|..o.=/@@. |
| . *+@=. |
|. . + o |
| . o S |
|. . . |
|. . |
|. . E |
| ...o |
+----[SHA256]-----+
Updating project ssh metadata...
.............................................................................................................................................................Updated [https://www.googleapis.com/compute/v1/projects/supersal].
..done.
Waiting for SSH key to propagate.
Warning: Permanently added 'compute.4114551798115890446' (ECDSA) to the list of known hosts.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 19 100 19 0 0 3022 0 --:--:-- --:--:-- --:--:-- 3166
scp: /opt/ads/apps/Profile360_Latest/serviceLog/Profile360Service.out: Permission denied
ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
sh: 0: Can't open SparkBuildPanelBatch.sh
I am not worried about that line:'scp: /opt/ads/apps/Profile360_Latest/serviceLog/Profile360Service.out: Permission denied ' because it out of use and will be fixed later. I don't understand why it cant open the .sh file.
/opt/ads/apps/appslocally, not on the remote server. - tripleee/opt/ads/apps/Profile360_Latest/serviceLogor perhaps its parent directory already exists on your local server, with permissions which do not allow you to write there. I don't thinkscpshould be able to create a directory it can't write to. Try to reduce this to a minimal reproducible example. - tripleeesudoin init actions, it is running as root. - Dagang