0
votes

I am trying to use the SageMaker Python SDK with PySpark on EMR (Jupyter) Notebook. When trying to use XGBoostSageMakerEstimator as shown below,

from sagemaker_pyspark.algorithms import XGBoostSageMakerEstimator

xgboost_estimator = XGBoostSageMakerEstimator(
    sagemakerRole=IAMRole(someRoleArn),
    trainingInstanceType='ml.m4.xlarge',
    trainingInstanceCount=1,
    endpointInstanceType='ml.m4.xlarge',
    endpointInitialInstanceCount=1)

I am getting the following error that I have not been able to find the solution to.

Exception ignored in: <bound method JavaWrapper.__del__ of <sagemaker_pyspark.wrapper.ScalaMap object at 0x7fd3d9e96240>>
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 40, in __del__
AttributeError: 'ScalaMap' object has no attribute '_java_obj'
Exception ignored in: <bound method JavaWrapper.__del__ of <sagemaker_pyspark.wrapper.ScalaMap object at 0x7fd3d9e96240>>
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 40, in __del__
AttributeError: 'ScalaMap' object has no attribute '_java_obj'
Exception ignored in: <bound method JavaWrapper.__del__ of <sagemaker_pyspark.wrapper.Option object at 0x7fd3d9e9d3c8>>
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 40, in __del__
AttributeError: 'Option' object has no attribute '_java_obj'
Exception ignored in: <bound method JavaWrapper.__del__ of <sagemaker_pyspark.wrapper.Option object at 0x7fd3d9e9d128>>
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 40, in __del__
AttributeError: 'Option' object has no attribute '_java_obj'
Exception ignored in: <bound method JavaWrapper.__del__ of <sagemaker_pyspark.wrapper.Option object at 0x7fd3d9e9d0f0>>
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 40, in __del__
AttributeError: 'Option' object has no attribute '_java_obj'
Exception ignored in: <bound method JavaWrapper.__del__ of <sagemaker_pyspark.wrapper.Option object at 0x7fd3d9e9d080>>
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 40, in __del__
AttributeError: 'Option' object has no attribute '_java_obj'
Exception ignored in: <bound method JavaWrapper.__del__ of <sagemaker_pyspark.wrapper.Option object at 0x7fd3d9e96ef0>>
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 40, in __del__
AttributeError: 'Option' object has no attribute '_java_obj'

Any help to troubleshoot this would be greatly appreciated.

Using:

  • EMR (emr-5.26.0) Cluster with Spark 2.4.3
  • EMR Notebook attached to the cluster
  • sagemaker_pyspark comes pre-installed with emr-5.26.0
1

1 Answers

1
votes

I encountered the same error. I believe sagemaker_pyspark is incompatible with Spark versions > 2.3.2 (source: https://github.com/aws/sagemaker-spark/commit/4055f1e05be7d5e764f2abc8b3d6fc2c252ae272). I was able to confirm this with someone who contributes to the project.

I ran my code with Spark 2.3.2 and didn't see the exceptions anymore.