0
votes

I am having the following issue with Azure Functions and Azure DataLake Gen2 connection:

Running the function locally everything works fine. It connects to the datalake, gets the input file, processes some logic and then upload the modified file to a new location within the datalake. See the integration overview below:

enter image description here

__init__.py

import logging
import xml.etree.ElementTree as ET

import azure.functions as func

def main(req: func.HttpRequest, inputBlob: func.InputStream, outputBlob: func.Out[func.InputStream]) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')

    try:
        data = inputBlob.read()
        xml = ET.fromstring(data.decode('utf-8'))

         # for loop here to perform some logic.

        outputBlob.set(ET.tostring(xml))

        return func.HttpResponse(
            "This HTTP triggered function executed successfully.",
            status_code=200
        )
    except ValueError as ex:
        return func.HttpResponse(
             "Unknown error has occured tih message: " + str(ex),
             status_code=400
        )

function.json

    {
  "scriptFile": "__init__.py",
  "bindings": [
    {
      "authLevel": "function",
      "type": "httpTrigger",
      "direction": "in",
      "name": "req",
      "methods": [
        "get",
        "post"
      ]
    },
    {
      "type": "http",
      "direction": "out",
      "name": "$return"
    },
    {
      "type": "blob",
      "direction": "in",
      "name": "inputBlob",
      "path": "https://.xml",
      "connection": "APP SETTING NAME"
    },
    {
      "type": "blob",
      "direction": "out",
      "name": "outputBlob",
      "path": "https://.xml",
      "connection": "APP SETTING NAME"
    }
  ]
}
  • I am using bindings to make the connection to the inputBlob and outputBlob.
  • I registered an APPLICATION SETTING to ensure the connection (format: DefaultEndpointsProtocol=https;AccountName=####;AccountKey=####;EndpointSuffix=core.windows.net). Similar as in local.settings.json

Running the trigger I keep getting the following error:

<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

I found this:Stack overflow post saying outputbindings don't work with datalakes, but I am confused why the local run works then?

Does anyone recognize this problem and if so have a way forward for me?

Kind regards,

Mark

2
can you please input bindings? please make sure the bindings string is diff from storage bindingssakulachi8
"type": "blob", "direction": "out", "name": "outputBlob", "path": "https://..../....xml "connection": "AzureWebJobsdevelopmentdcstorage" @sakulachi8. The connection is the same for the inputBlobRickSancheez
The key is what you want. If the output blob binding is used, the objects of the datalake package cannot handle blobs that have been processed using the blob type. If you just want ordinary blob type objects, outputbinding can be used.Bowman Zhu
Any fire wall of your datalake?Bowman Zhu
And please show your function.json in your question.:) outputbinding is createifnotexist, so the problem maybe comes from the inputbinding.Bowman Zhu

2 Answers

0
votes

Running the trigger I keep getting the following error:

<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

Yes, of course. This is because your path of the input binding is wrong.

The format should be like this:

"path": "containername/foldername1/foldername2/.../filename"

This is the offcial doc of input:

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-input?tabs=python

And this is output:

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-output?tabs=python

I found this:Stack overflow post saying outputbindings don't work with datalakes, but I am confused why the local run works then?

This is a problem stemming from azure function outputbinding, I can roughly describe it. Why I say this is because the package based on the function's outputbinding is not a datalake package, so you need to manually use the datalake service in the function body to get data.

The consequence of using azure function binding to process data is that you will no longer be able to use the datalake package to process files.

(This question has been there until I wrote that answer.)

-1
votes

Problem solved!

output binding not used, but put the logic in the function body as per documentation: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage/azure-storage-file-datalake/samples

I kept getting the 404 message, but with some research I found an error labelled: ClientClosedRequest. Only showed up in the iOS app of Azure...

From there I came to the conclusion the XML file was too large and caused a time out..

Thanks for the guidance @bowman Zhu, made me do a bit more digging.