0
votes

I am trying to run this following bootstrap script in AWS EMR and it is failing without providing an error message. I have setup EMR cluster for applications: Spark, Hive, Ganglia, and Livy.

!/bin/bash

sudo pip-3.6 install -U \ matplotlib \ pandas \ spark-nlp

I am using EMR version 6. Previously we used just "pip" instead of "pip-3.6" but it still error out hence we decided to try out with "pip-3.6" since we assumed that EMR 6 has python 3.6.

Kindly let me know what could be wrong with this.

Thanks!

1
probably you missed the #? your first line must starts with #!/bin/bash - gccodec
You cannot crash an EMR job without an error message. Find the step logs and the node logs. There will be an error message within. - Dave

1 Answers

0
votes

you can try running your script on a running EMR cluster to ensure it works fine. Emr release notes for emr6 suggests python3 is default . https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-6x.html

and on a quick check it seems it is coming with python3.7 . this is why your script may not be working. so you will need to use pip3 or /usr/bin/pip-3.7 (specify the complete path, it may be possible the alias is not set for pip-3.7)

hence instead try using

$ sudo pip3 install matplotlib ...

or

$ sudo python -m pip3 install matplotlib ...