Pyspark2 Writing to CSV Issue?

Question

I am running a py file through the command:

/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/bin/spark2-submit --jars /home/jsonnt200/geomesa-hbase-spark-runtime_2.11-1.3.5.1cc.jar,/ccri/hbase-site.zip geomesa_klondike_enrichment2.py

This results in the following error:

Traceback (most recent call last): File "/home/jsonnt200/geomesa_klondike_enrichment2.py", line 6306, in df2_500m.write.option('header', 'true').csv('/user/jsonnt200/klondike_201708_1m_500meter_testEQ_union4') File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658/lib/spark2/python/pyspark/sql/readwriter.py", line 711, in csv self._jwrite.csv(path) File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658/lib/spark2/python/pyspark/sql/utils.py", line 79, in deco raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.IllegalArgumentException: u'Illegal pattern component: XXX'

The biggest concern is if I submit this same py file through ipython, it runs correctly. Any ideas on what could be the issue? Unfortunately, I have to use the spark2-submit for tunnelling purposes.

emirlej emirlej · Accepted Answer · 2018-03-06T13:55:27

You are using Spark 2.2.0, right? I have encountered the same issue when trying to read a csv file. The problem, I think, is the timestampFormat variabel. Its default value is yyyy-MM-dd'T'HH:mm:ss.SSSXXX. Ref. pyspark.sql documentation.

When I change it to e.g. timestampFormat="yyyy-MM-dd", my code works. This issue is also mentioned in this post. Hope it helps :).

Pyspark2 Writing to CSV Issue?

1 Answers