0
votes

I am trying to save a dataframe 'df2' into a text file using below code

code: df2.write.format('text').mode('overwrite').save('/tmp/hive/save_text')

Error:

org.apache.spark.sql.AnalysisException: Text data source does not support int data type.;

Py4JJavaError Traceback (most recent call last) /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e:

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 327 "An error occurred while calling {0}{1}{2}.\n". --> 328 format(target_id, ".", name), value) 329 else:

Py4JJavaError: An error occurred while calling o1239.save. : org.apache.spark.sql.AnalysisException: Text data source does not support int data type.;

**Ask: Please suggest how to write data from a dataframe into a text file **

1

1 Answers

2
votes

Note that, in order to use write.format('text'), your dataframe must have only one column else it will throw error. Hence you need to covert all columns into single column.

Alternately, you can use write.format('csv') or else you can convert it into RDD and save it as text file.

say for example your dataframe contains two columns viz. id, name (id is int and name is string) and you want to write as id,name in output file. For this, write code as below:

df2.rdd.map(lambda x : str(x[0]) + "," + x[1]).saveAsTextFile('/tmp/hive/save_text')