0
votes

Getting straight to it: Why is my R code doing fine on my local CPU (under one minute), but tens of times slower on Azure Machine Learning, using one R script block (over 18 minutes)?

I assume that it has to do with the resources allocated to the experiment, but how can I be sure? Can I obtain details about the resource allocated to the R script block from somwehere hidden in the Azure-ML Studio machinery?

Thank you, Flo

Later Edit: As it often happens, I did finally find some information, which still doesn't solve my issue. According to https://msdn.microsoft.com/library/en-us/Dn905952.aspx#Technical%20Notes "User-specified R code is run by a 64-bit R interpreter that runs in Azure using an A8 virtual machine with 56 GB of RAM."

This is more than my local machine has, the R code is still much slower on the Azure-ML studio.

1
Can you post your code along with the AML experiment layout? - Hong Ooi
If you are using the free tier, everything runs on 1 core and you are stuck in a queue. If you are a paying customer you have full access to the A8 instance, but Azure ML runs R-code on a single core. And if the data center is very busy, you might not be garantueed the full use of the instance. - phiver
I can/should not post the experiment layout, the code is too large for posting (with several other files sourced in the main script). - Flo-Mel
@phiver I have full access to the A8 instance, but the code is consistently slower (ran once a day for several weeks) than on my machine. The only exlanation I have, according to your comment, is the one-core only on Azure-ML (locally I have 4-core machine wtih 12GB ram) and the probably very busy data center. - Flo-Mel
Publish your Azure ML experiment as a private experiment & share the link here. Then we can lookup for the code. - Haritha Thilakarathne

1 Answers

0
votes

Consider using rbenchmark or other benchmarking tools to get an idea of the runtime and complexity of your code. In general for loops tend to be slow.

It's very possible that the server has less resources available (ram, cpu) or that you have to wait in a que before you get served. Without any more code it's hard to comment further on this issue.