0
votes

I am submiting Pyspark/SparkSQL script using spark-submit option and I need to pass runtime variables (database name) to script

spark-submit command:

spark-submit --conf database_parameter=my_database my_pyspark_script.py

pyspark script

database_parameter = SparkContext.getConf().get("database_parameter")           

DF = sqlContext.sql("SELECT count(*) FROM database_parameter.table_name")

spark version is: 1.5.2
Python version is: 2.7.5

The solution I am trying is not working. Error is : AttributeError: type object 'SparkConf' has no attribute 'getConf'.

I am looking for a way to pass runtime variable while calling the script through spark-submit and use those variables in script.

1

1 Answers

3
votes

You can use the usual sys.argv

args.py

#!/usr/bin/python

import sys
print sys.argv[1]

Then you spark-submit it :

spark-submit args.py my_database 

This will print:

my_database