I have 3 nodes hadoop and spark installed. I would like to take data from rdbms into data frame and write this data into parquet on HDFS. "dfs.replication" value is 1 .
When i try this with following command i have seen all HDFS blocks are located on node which i executed spark-shell.
scala> xfact.write.parquet("hdfs://sparknode01.localdomain:9000/xfact")
Is this the intended behaviour or should all blocks be distributed across the cluster?
Thanks