saving a dataframe to JSON file on local drive in pyspark

Question

I have a dataframe that I am trying to save as a JSON file using pyspark 1.4, but it doesn't seem to be working. When i give it the path to the directory it returns an error stating it already exists. My assumption based off the documentation was that it would save a json file in the path that you give it.

df.write.json("C:\Users\username")

Specifying a directory with a name doesn't produce any file and gives and error of "java.io.IOException: Mkdirs failed to create file:/C:Users/username/test/_temporary/....etc. It does however create a directory of the name test which contains several sub-directories with blank crc files.

df.write.json("C:\Users\username\test")

And adding a file extension of JSON, produces the same error

df.write.json("C:\Users\username\test.JSON")

I think you need to give it a complete file name, not just the directory. — Brobin
yes, i verified the permissions on that directory and used getpass.getuser() from python to verify that i was logged in as that user via the console. — Jared
try an alternate approach such as df.toJSON().saveAsTextFile(path) — urug
I too faced such a problem when using windows.. So I changes to Linux where same code worked perfectly ... — Kavindu Dodanduwa
Thanks for giving it a try. I figured it had something to do with Windows, ughhh.... — Jared

Wesley Bowman Wesley Bowman · Accepted Answer · 2015-06-29T14:39:43

Could you not just use

df.toJSON()

as shown here? If not, then first transform into a pandas DataFrame and then write to json.

pandas_df = df.toPandas()
pandas_df.to_json("C:\Users\username\test.JSON")

saving a dataframe to JSON file on local drive in pyspark

3 Answers