This was suppose to be simple test to move the first row of my dataframe into a new dataframe.
first issue df.first() returns a "row" not a dataframe. next problem, when I try to use spark.createDataFrame(df.first()) it will tell you that it can not infer the schema.
next problem spark.createDataFrame(df.first(), df.schema) does not work.
so for the original schema below:
root
|-- entity_name: string (nullable = true)
|-- field_name: array (nullable = true)
| |-- element: string (containsNull = true)
|-- data_row: array (nullable = true)
| |-- element: string (containsNull = true)
|-- data_schema: array (nullable = true)
| |-- element: string (containsNull = true)
I defined the schema in code thus:
xyz_schema = StructType([
StructField('entity_name',StringType(),True)
,StructField('field_name',ArrayType(StringType(),True),True)
,StructField('data_row',ArrayType(StringType(),True),True)
,StructField('data_schema',ArrayType(StringType(),True),True)
])
print(xyz.first())
xyz_1stRow = spark.createDataFrame(xyz.first(), xyz_schema)
The above does not work! I get the following error:
"TypeError: StructType can not accept object 'parquet/assignment/v1' in type <class 'str'>"
this is what the print shows me...
Row(entity_name='parquet/assignment/v1', field_name=['Contract_ItemNumber', 'UPC', 'DC_ID', 'AssignDate', 'AssignID', 'AssignmentQuantity', 'ContractNumber', 'MaterialNumber', 'OrderReason', 'RequirementCategory', 'MSKU'], data_row=['\n
350,192660436296,2001,10/1/2019,84009248020191000,5,840092480,1862291010,711,V1\n\t\t\t\t\t', '\n
180,191454773838,2001,10/1/2019,84009248020191000,6,840092480,1791301010,711,V1\n\t\t\t\t\t'], data_schema=['StringType', 'StringType', 'StringType', None, 'StringType', 'IntegerType', 'StringType', 'StringType', 'StringType', 'StringType', 'StringType'])
What am I doing wrong? why does a stringtype not accept a string?
I'm working in pyspark (current version) with Azure databricks. I'd prefer to stay with pyspark, not R, not Scala, and not have to convert to pandas and risk my data being corrupted converting between all these languages.