I have a PySpark job that I am distributing across a 1-master, 3-worker cluster.
I have some python print commands which help me debug my code.
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')
print('Pad sequences (samples x time)')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)
Now, when I run the code on Google Dataproc with the master set as local, the print outputs correctly. However, when I try to run it on yarn, the print with YARN-based Spark, the print outputs do not appear in the Google Cloud Console under the jobs section of the Dataproc UI.
Where can I access these python print outputs from each of the workers and master which do not appear in the Google Dataproc Console