I'm trying to create a spark dataframe of (one column DT
and one row with date of 2020-1-1
) manually.
DT
=======
2020-01-01
However, it got the error of list index out of range
?
spark = SparkSession.builder\
.master(f'spark://{IP}:7077')\
.config('spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version', '2')\
.appName('g data')\
.getOrCreate()
spark.conf.set('spark.sql.sources.partitionOverwriteMode', 'dynamic')
dates = spark.createDataFrame([(pd.to_datetime('2020-1-1'))], ['DT'])
Traceback:
in brand_tagging_since_until(spark, since, until) ---> 81 dates = spark.createDataFrame([(pd.to_datetime('2020-1-1'))], ['DT']) /usr/local/bin/spark/python/pyspark/sql/session.py in createDataFrame(self, data, schema, samplingRatio, verifySchema) 746 rdd, schema = self._createFromRDD(data.map(prepare), schema, samplingRatio) 747 else: --> 748 rdd, schema = self._createFromLocal(map(prepare, data), schema) 749 jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd()) 750 jdf = self._jsparkSession.applySchemaToPythonRDD(jrdd.rdd(), schema.json()) /usr/local/bin/spark/python/pyspark/sql/session.py in _createFromLocal(self, data, schema) 419 if isinstance(schema, (list, tuple)): 420 for i, name in enumerate(schema): --> 421 struct.fields[i].name = name 422 struct.names[i] = name 423 schema = struct
DT
the column name in a single column dataframe, or a value in the row? – Nick Becker2020-01-01
. – ca9163d9