0
votes

I've a two-node spark standalone cluster and I'm trying to read some parquet files that I just saved but am getting files not found exception.

Checking the location, it looks like all the parquet files got created on one of the nodes in my standalone cluster.

The problem now, reading the parquet files back, it says cannot find xasdad.part file.

The only way I manage to load it is to scale down the standalone spark cluster to one node.

My question is how can I load my parquet files while running more than one node in my standalone cluster ?

1
You're probably writing to a file system that's not visible across the network. Make sure that if you manually create a file at the write path, you can see it everywhere.Tim
@TimP I didn't not manually create the files, spark saved the files but only on one nodeAdetiloye Philip Kehinde
So try to manually create a file in the same directory, then ssh to every node in your cluster and make sure it's visibleTim

1 Answers

2
votes

You have to put your files on a shard directory which is accessible to all spark nodes with the same path. Otherwise, use spark with Hadoop HDFS : a distributed file system.