0
votes

I'm getting ClassCastException when trying to traverse the directories in mounted Databricks volume.

java.lang.ClassCastException: com.databricks.backend.daemon.dbutils.FileInfo cannot be cast to com.databricks.service.FileInfo
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
    at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
    at scala.collection.TraversableLike.map(TraversableLike.scala:238)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at com.mycompany.functions.UnifiedFrameworkFunctions$.getAllFiles(UnifiedFrameworkFunctions.scala:287)

where getAllFiles function looks like:

    import com.databricks.service.{DBUtils, FileInfo}
    ...
    def getAllFiles(path: String): Seq[String] = {
        val files = DBUtils.fs.ls(path)
        if (files.isEmpty)
            List()
        else
            files.map(file => { // line where exception is raised
                val path: String = file.path
                if (DBUtils.fs.dbfs.getFileStatus(new org.apache.hadoop.fs.Path(path)).isDirectory) getAllFiles(path)
                else List(path)
            }).reduce(_ ++ _)
    }

Locally it runs OK with Databricks Connect, but when src code is packaged as jar and run on Databricks cluster the above exception is raised.

Since Databricks in their documentation suggest using com.databricks.service.DBUtils and when calling DBUtils.fs.ls(path) it returns FileInfo from the same service package - is this a bug or should the api be used in some other way?

I'm using Databricks Connect & Runtime of version 8.1