3
votes

While I am running a spark job I can see the SSL key password, keystorepassword are visible in plain text in event log. Can you please help me how I can hide these password from the logs.

While I look at the below https://issues.apache.org/jira/browse/SPARK-16796 seems like they fixed it to hide it from web UI. But not sure I can fix it in log

Your help is really appreciated !!

"{"Event":"SparkListenerLogStart","Spark Version":"2.1.1"} {"Event":"SparkListenerBlockManagerAdded","Block Manager ID":{"Executor ID":"driver","Host":"xx.xxx.xx.xxx","Port":43556},"Maximum Memory":434031820,"Timestamp":1512750709305} {"Event":"SparkListenerEnvironmentUpdate","JVM Information":{"Java Home":"/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-1.b16.32.amzn1.x86_64/jre","Java Version":"1.8.0_141 (Oracle Corporation)","Scala Version":"version 2.11.8"},"Spark Properties":{"spark.sql.warehouse.dir":"hdfs:///user/spark/warehouse","spark.yarn.dist.files":"file:/etc/spark/conf/hive-site.xml","spark.executor.extraJavaOptions":"-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'","spark.driver.host":"xx.xxx.xx.xxx","spark.serializer.objectStreamReset":"100","spark.history.fs.logDirectory":"hdfs:///var/log/spark/apps","spark.eventLog.enabled":"true","spark.driver.port":"44832","spark.shuffle.service.enabled":"true","spark.rdd.compress":"True","spark.driver.extraLibraryPath":"/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native","spark.ssl.keyStore":"/usr/share/aws/emr/security/conf/keystore.jks","spark.executorEnv.PYTHONPATH":"{{PWD}}/pyspark.zip{{PWD}}/py4j-0.10.4-src.zip","spark.ssl.enabled":"true","spark.yarn.historyServer.address":"ip-xx-xxx-xx-xxx.xxx.com:18080","spark.ssl.trustStore":"/usr/share/aws/emr/security/conf/truststore.jks","spark.app.name":"claim_line_fact_main","spark.scheduler.mode":"FIFO","spark.network.sasl.serverAlwaysEncrypt":"true","spark.ssl.keyPassword":"xxxxxx","spark.ssl.keyStorePassword":"xxxxxx","spark.executor.id":"driver","spark.driver.extraJavaOptions":"-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'","spark.submit.deployMode":"client","spark.master":"yarn","spark.authenticate.enableSaslEncryption":"true","spark.authenticate":"true","spark.ui.filters":"org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter","spark.executor.extraLibraryPath":"/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native","spark.sql.hive.metastore.sharedPrefixes":"com.amazonaws.services.dynamodbv2","spark.executor.memory":"5120M","spark.driver.extraClassPath":"/usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/","spark.eventLog.dir":"hdfs:///var/log/spark/apps","spark.ssl.protocol":"TLSv1.2","spark.dynamicAllocation.enabled":"true","spark.executor.extraClassPath":"/usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/","spark.executor.cores":"4","spark.history.ui.port":"18080","spark.driver.appUIAddress":"http://","spark.yarn.isPython":"true","spark.ssl.trustStorePassword":"xxxxxx","spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS":"ip-xx-xxx-xx-xxx.xxx.com","spark.ssl.enabledAlgorithms":"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA","spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES":"

1
What Spark version do you use? Can you show more logs where you see the line in the log? How do you start the application to see the line in the logs? There have been works to redact sensitive information from logs, but mostly for History Server and web UI's Environment tab.Jacek Laskowski
Spark Version is 2.1.1 . Running it in Spark-Submit job .. Updated the logs ..Surajit Kundu
@SurajitKundu - Did you able to resolve it ? I am stuck at same point. Can you please help how you get through with this ?tenderfoot

1 Answers

0
votes

The messages that logs with INFO, WARN, ERROR only can be controlled by log4j.properties files. If you want to hide the password or any confidential parameter that passed with -D to spark can be hidden by removing the parameter --verbose in spark-submit. This is worked for me