0
votes

We are trying to create an Azure ML web-service that will receive a (.csv) data file, do some processing, and return two similar files. The Python support recently added to the azure ML platform was very helpful and we were able to successfully port our code, run it in experiment mode and publish the web-service.

Using the "batch processing" API, we are now able to direct a file from blob-storage to the service and get the desired output. However, run-time for small files (a few KB) is significantly slower than on a local machine, and more importantly, the process seems to never return for slightly larger input data files (40MB). Processing time on my local machine for the same file is under 1 minute.

My question is if you can see anything we are doing wrong, or if there is a way to get this to speed up. Here is the DAG representation of the experiment:

The DAG representation of the experiment

Is this the way the experiment should be set up?

1
Throwing this question to internal team, see if I can get a good reply for you... - Dan Ciborowski - MSFT
We understand the privacy related to your process... but will need a bit more information to help... Can we get some kinda idea what your doing? Providing a generic base case, with some actual python code and file formats will be needed if we don't have direct access to your workspace... If you are willing to post your workspace ID we can look at some internal logs which will help us to debug... - Dan Ciborowski - MSFT
Thank you for your reply. I believe this is the workspace ID: 6ad62803867e4060a11b0d32ba74e744 - Hezi Resheff

1 Answers

1
votes

It looks like the problem was with processing of a timestamp column in the input table. The successful workaround was to explicitly force the column to be processed as string values, using the "Metadata Editor" block. The final model now looks like this:

final model