0
votes

I am using Databricks notebook to read and write the file into the same location. But when I write into the file I am getting a lot of files with different names. Like this:

enter image description here

I am not sure why these files are created in the location I specified. Also, another file with the name "new_location" was created after I performed the write operation

enter image description here

What I want is that after reading the file from Azure Blob Storage I should write the file into the same location with the same name as the original into the same location. But I am unable to do so. please help me out as I am new to Pyspark I have already mounted and now I am reading the CSV file store in an azure blob storage container. The overwritten file is created with the name "part-00000-tid-84371752119947096-333f1e37-6fdc-40d0-97f5-78cee0b108cf-31-1-c000.csv"

Code:

df = spark.read.csv("/mnt/ndemo/nsalman/addresses.csv", inferSchema = True)
df = df.toDF("firstName","lastName","street","town","city","code")
df.show()
file_location_new = "/mnt/ndemo/nsalman/new_location"
# write the dataframe as a single file to blob storage
df.write.format('com.databricks.spark.csv') \
  .mode('overwrite').option("header", "true").save(file_location_new)