When running my Spark jobs in an HDInsight cluster and reading data from the Azure Data Lake Store, I see that the locality level of my tasks always seems to be set to PROCESS_LOCAL. However, I don't quite understand how such data locality can be achieved in a cloud environment.
Is Azure actually moving my code close to the data as can be done with regular HDFS or is the locality level simply set to PROCESS_LOCAL while the data is in reality being loaded over network?
In other words, is Azure somehow provisioning the HDInsight worker nodes into proximity of the data I have in the ADLS or what is the explanation for the locality level I see in Spark UI?