5
votes

enter image description hereenter image description hereI'm new to the Databricks, need help in writing a pandas dataframe into databricks local file system.

I did search in google but could not find any case similar to this, also tried the help guid provided by databricks (attached) but that did not work either. Attempted the below changes to find my luck, the commands goes just fine, but the file is not getting written in the directory (expected wrtdftodbfs.txt file gets created)

  1. df.to_csv("/dbfs/FileStore/NJ/wrtdftodbfs.txt")

Result: throws the below error

FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/NJ/wrtdftodbfs.txt'

  1. df.to_csv("\\dbfs\\FileStore\\NJ\\wrtdftodbfs.txt")

Result: No errors, but nothing written either

  1. df.to_csv("dbfs\\FileStore\\NJ\\wrtdftodbfs.txt")

Result: No errors, but nothing written either

  1. df.to_csv(path ="\\dbfs\\FileStore\\NJ\\",file="wrtdftodbfs.txt")

Result: TypeError: to_csv() got an unexpected keyword argument 'path'

  1. df.to_csv("dbfs:\\FileStore\\NJ\\wrtdftodbfs.txt")

Result: No errors, but nothing written either

  1. df.to_csv("dbfs:\\dbfs\\FileStore\\NJ\\wrtdftodbfs.txt")

Result: No errors, but nothing written either

The directory exists and the files created manually shows up but pandas to_csv never writes nor error out.

dbutils.fs.put("/dbfs/FileStore/NJ/tst.txt","Testing file creation and existence")

dbutils.fs.ls("dbfs/FileStore/NJ")

Out[186]: [FileInfo(path='dbfs:/dbfs/FileStore/NJ/tst.txt', name='tst.txt', size=35)]

Appreciate your time and pardon me if the enclosed details are not clear enough.

2
Try converting it to a spark data frame then save it as a csv pandas most likely doesn't have access to the filestoreUmar.H
Is it a Spark dataframe or Pandas? The code at the top talks about Spark but everything else looks like Pandas. If it is involving Pandas, you need to make the file using df.to_csv and then use dbutils.fs.put() to put the file you made into the FileStore following here. If it involves Spark, see here.Wayne
Have you tried: with open("/dbfs/FileStore/NJ/wrtdftodbfs.txt", "w") as f: df.to_csv(f)?PMende
Thanks for the response Mende. I did try that but no luck, it runs fine but file is not making into the directory.Shaan Proms
Thanks so much Wayne. The second link shared worked. I have converted pandas data frame to spark. Not sure if Databricks filestore works only thru spark commands for writing data to its file system.Shaan Proms

2 Answers

2
votes

Try with this in your notebook databricks:

import pandas as pd
from io import StringIO

data = """
CODE,L,PS
5d8A,N,P60490
5d8b,H,P80377
5d8C,O,P60491
"""

df = pd.read_csv(StringIO(data), sep=',')
#print(df)
df.to_csv('/dbfs/FileStore/NJ/file1.txt')

pandas_df = pd.read_csv("/dbfs/FileStore/NJ/file1.txt", header='infer') 
print(pandas_df)
1
votes

This worked out for me:

outname = 'pre-processed.csv'
outdir = '/dbfs/FileStore/'
dfPandas.to_csv(outdir+outname, index=False, encoding="utf-8")

To download the file:

https://community.cloud.databricks.com/files/pre-processed.csv?o=189989883924552#

(you need to edit your home url, for me is :

https://community.cloud.databricks.com/?o=189989883924552#)

dbfs file explorer