1
votes

I have a PySpark job that I am distributing across a 1-master, 3-worker cluster.

I have some python print commands which help me debug my code.

print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

print('Pad sequences (samples x time)')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

Now, when I run the code on Google Dataproc with the master set as local, the print outputs correctly. However, when I try to run it on yarn, the print with YARN-based Spark, the print outputs do not appear in the Google Cloud Console under the jobs section of the Dataproc UI.

Where can I access these python print outputs from each of the workers and master which do not appear in the Google Dataproc Console

2

2 Answers

0
votes

If you're using Dataproc, why to access the logs via Spark UI? The better way would be to:

  • Submit a job using gcloud dataproc jobs submit example

  • Once the job is submitted, you can access Cloud Dataproc job driver output using the Cloud Platform Console, the gcloud command, or Cloud Storage, as explained below.

The Cloud Platform Console allows you to view a job's realtime driver output. To view job output, go to your project's Cloud Dataproc Jobs section, then click on the Job ID to view job output.

enter image description here

Reference Documentation

0
votes

If you really want to access to the YARN interface (with the detailed list of all the jobs and their logs), you can do the following :

Just click on your master.