0
votes

I keep getting connection timeouts when I try to create a dataset (HTTP) in an Azure Data Factory which is based on a Azure Function (HTTP trigger):

  The cloud service request timed out. Please retry. Activity ID:9d70efcd-c842-4484-9313-4872208a2a9e

However, if I call function from anywhere else e.g. from apitester.com, I get the desired response:

The source code of the function HttpTriggerPython32 is as follows:

import os, sys, json
from datetime import datetime

postreqdata = json.loads(open(os.environ['req']).read())

responseData = {
    'timestamp' : datetime.now(pytz.timezone("Europe/Zurich")).strftime("%Y-%m-%d %H:%M:%S"),
    'python_version' : sys.version
}

response = open(os.environ['res'], 'w')
response.write(str(responseData))

I've successfully added other 3rd party HTTP services as datasets in the data factory. Also, I've managed to call the function by using a Web activity. The error only occurs when I try to use the function as a dataset source.

To add the function as data factory dataset I created a new dataset with the following parameters:

{
    "name": "HttpFile1",
    "properties": {
        "linkedServiceName": {
            "referenceName": "AF_srfscadapa_af1",
            "type": "LinkedServiceReference"
        },
        "type": "HttpFile",
        "typeProperties": {
            "format": {
                "type": "JsonFormat",
                "filePattern": "setOfObjects"
            },
            "relativeUrl": "HttpTriggerPython32?code=L5WVNJh8loDv3mZFcD/AKatNRoYfYoHlDbEBk4AEGrbDA39ddAXsyg==",
            "requestMethod": "Post",
            "requestBody": "{\n    \"group_name\": \"Azure POC\"\n}"
        }
    }
}

The linked service "AF_srfscadapa_af1" is configured as follows:

2
Have you tried different timeout values for your Http activity? docs.microsoft.com/en-us/azure/data-factory/….Connor McMahon
I am not super familiar with Azure Data Factory (I am on the Functions team), but is it possible that your linked service is having issues because the base URL returns a 404? Maybe try making your base URL srf-scadapa-fa2-windows.azurewebsites.net and editing your relative URL accordingly?Connor McMahon
@ConnorMcMahon Thanks, I managed to solve it using the timeout setting parameter in the copy activityitscdo

2 Answers

1
votes

Azure Functions (along with pretty much all serverless platforms) has to deal with the cold-start latency problem, where the first request to a function application after a period of no use takes longer. This is because the platform needs to instantiate an instance of your application before servicing the request. This can add a non-trivial amount of time to a request, which could have increased your latency over what is allowed by the default timeout for Azure Data Lake.

According to the documentation for Azure Data Factory, Copy Activities with HTTP as a source have a timeout field. Configuring this to a higher value may fix this issue.

0
votes

Manually defining all the columns of the source and increasing the timeout in the copy activity solved the problem.

I was using the Data Factory GUI where it's not possible to set a timeout for previewing a source or importing a schema. So in this particular case the "Import schema" function simply does not work.