I'm getting ClassCastException when trying to traverse the directories in mounted Databricks volume.
java.lang.ClassCastException: com.databricks.backend.daemon.dbutils.FileInfo cannot be cast to com.databricks.service.FileInfo
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at com.mycompany.functions.UnifiedFrameworkFunctions$.getAllFiles(UnifiedFrameworkFunctions.scala:287)
where getAllFiles
function looks like:
import com.databricks.service.{DBUtils, FileInfo}
...
def getAllFiles(path: String): Seq[String] = {
val files = DBUtils.fs.ls(path)
if (files.isEmpty)
List()
else
files.map(file => { // line where exception is raised
val path: String = file.path
if (DBUtils.fs.dbfs.getFileStatus(new org.apache.hadoop.fs.Path(path)).isDirectory) getAllFiles(path)
else List(path)
}).reduce(_ ++ _)
}
Locally it runs OK with Databricks Connect, but when src code is packaged as jar and run on Databricks cluster the above exception is raised.
Since Databricks in their documentation suggest using com.databricks.service.DBUtils
and when calling DBUtils.fs.ls(path)
it returns FileInfo
from the same service
package - is this a bug or should the api be used in some other way?
I'm using Databricks Connect & Runtime of version 8.1