I am using HDInsight on Azure to research the scalability of ranking machine learning methods (learning to rank, for the insiders) on Hadoop. I managed to test run my implementation of a learning to rank algorithm on a HDInsight cluster and clocked its time to complete the operation.
Now I want to run the same code over and over again with different numbers of cores to see how the running time scales as a function of the number of cores. From other questions on this forum I understood that HDInsight does not allow changing the number of cores of a cluster. Would it instead be possible in some way to delete the current cluster, and then create a new cluster that makes use of the exact same container on my Azure Storage? I tried to do this by simply giving the new cluster the same name as the previous one (as the container that is created for a new cluster is automatically named after the cluster at creation time), but that doesn't work as the new container created for this new cluster will have "-1" appended to the cluster name. The datafile that I am trying to process is around 15GB in size, so it would be a real pain in the ass if I would need to upload this file to the cluster container for each cluster that I create.
Any help on how I can run my algorithms on HDInsight with varying numbers of cores without having to re-upload my input data for each point of measurement would be very much appreciated!
Kind Regards,
Niek Tax