I am trying to achieve the same functionality as this SO post Spark dataframe save in single file on hdfs location except my file is located in Azure Data Lake Gen2, and I am using pyspark in Databricks notebook.
Below is the code snippet I am using to rename the file
from py4j.java_gateway import java_import
java_import(spark._jvm, 'org.apache.hadoop.fs.Path')
destpath = "abfss://" + contianer + "@" + storageacct + ".dfs.core.windows.net/"
fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration())
file = fs.globStatus(sc._jvm.Path(destpath+'part*'))[0].getPath().getName()
#Rename the file
I receive an IndexError: list index out of range
on this line
file = fs.globStatus(sc._jvm.Path(destpath+'part*'))[0].getPath().getName()
The part* file does exist in the folder.
1) Is this the right approach to rename file that databricks(pyspark) writes to Azure DataLakeGen2, if not, how else can I accomplish this?