I am new to dataproc cluster and PySpark so, in the process of looking for codes to load table from bigquery to the cluster, i came across the code below and was unable to figure out what all am i suppose to change for my usecase in this code and what are we providing as an input in the input directory
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
import subprocess
sc = SparkContext()
spark = SparkSession(sc)
bucket = spark._jsc.hadoopConfiguration().get('fs.gs.system.bucket')
project = spark._jsc.hadoopConfiguration().get('fs.gs.project.id')
input_directory = 'gs://{}/hadoop/tmp/bigquery/pyspark_input'.format(bucket)
conf = {
'mapred.bq.project.id': project,
'mapred.bq.gcs.bucket': bucket,
'mapred.bq.temp.gcs.path': input_directory,
'mapred.bq.input.project.id': 'dataset_new',
'mapred.bq.input.dataset.id': 'retail',
'mapred.bq.input.table.id': 'market',
}