0
votes

I am reading sas file from azure blob . Converting it to csv and trying to upload csv to azure blob . However for small files in MBs I am able to do the same successfully with the following spark scala code .

    import org.apache.spark.SparkContext 
    import org.apache.spark.SparkConf
    import org.apache.spark.sql.SQLContext 
    import com.github.saurfang.sas.spark._

     val sqlContext = new SQLContext(sc) 
   val df=sqlContext.sasFile("wasbs://container@storageaccount/input.sas7bdat")
     df.write.format("csv").save("wasbs://container@storageaccount/output.csv");

But for large files in GB it gives me Analysis exception wasbs://container@storageaccount/output.csv file already exists exception. I have tried overwrite also . But no luck . Any help would be appriciated

1

1 Answers

0
votes

Actually, you could not overwrite an existing file on HDFS normally, even for small files in MBs.

Please try to use the code below to overwrite, please check your spark version because there are some differences to use the methed for different spark version.

df.write.format("csv").mode("overwrite").save("wasbs://container@storageaccount/output.csv");

I don't know the code above using overwrite mode whether you had tried as you said.

So there is another way to do it that first delete the existing files befer do the overwrite operation.

val hadoopConf = new org.apache.hadoop.conf.Configuration()
val hdfs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("<hdfs://<namenodehost>/ or wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path> >"), hadoopConf)
try { hdfs.delete(new org.apache.hadoop.fs.Path(filepath), true) } catch { case _ : Throwable => { } }

And there is a spark topic discussed similar issue, please see http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-make-Spark-1-0-saveAsTextFile-to-overwrite-existing-file-td6696.html.