I'm trying to train a Form Recognizer using the browser API console (https://eastus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api/operations/TrainCustomModel/console). I've uploaded traning images to a container and created an SAS. The browser API console generate following HTTP request:
POST https://eastus.api.cognitive.microsoft.com/formrecognizer/v1.0-preview/custom/train?source=https://pythonimages.blob.core.windows.net/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rl&se=2020-01-22T00:23:33Z&st=2020-01-21T16:23:33Z&spr=https&sig=••••••••••••••••••••••••••••••••&prefix=images HTTP/1.1
Host: eastus.api.cognitive.microsoft.com
Content-Type: application/json
Ocp-Apim-Subscription-Key: ••••••••••••••••••••••••••••••••
{
"source": "string",
"sourceFilter": {
"prefix": "string",
"includeSubFolders": true
}
}
However, the answer I get back is
Transfer-Encoding: chunked
x-envoy-upstream-service-time: 4
apim-request-id: 5ad37aa2-e251-4b61-98ae-023930b47d27
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
Date: Tue, 21 Jan 2020 16:25:03 GMT
Content-Type: application/json; charset=utf-8
{
"error": {
"code": "1004",
"message": "Dataset path must be relative to local input mount path '/input' if local data is referenced."
}
}
I don't understand why it seems to be looking for data locally. I've experimented with the SAS, e.g. including the container name (images) in the blob http address rather than as a query parameter, but no success so far.
I've also tried the Python/REST path (described here: https://docs.microsoft.com/en-gb/azure/cognitive-services/form-recognizer/quickstarts/python-train-extract-v1), which results in a different error:
Response status code: 408
Response body: {'error': {'code': '1011', 'innerError': {'requestId': 'e7f9ef9f-97bc-4b6a-86f3-0b29c9591c87'}, 'message': 'The operation exceeded allowed time limit and was canceled. The common reasons are that the data source is too large or contains unsupported content. Please check that your request conforms to service limits and retry with redacted data source.'}}
For completeness, the code I use is as follows (key/signature *ed out:)
########### Python Form Recognizer Train #############
from requests import post as http_post
# Endpoint URL
base_url = r"https://markusformsrecognizer.cognitiveservices.azure.com/" + "/formrecognizer/v1.0-preview/custom"
source = r"https://pythonimages.blob.core.windows.net/images?sv=2019-02-02&ss=bfqt&srt=sco&sp=rl&se=2020-01-22T15:37:26Z&st=2020-01-22T07:37:26Z&spr=https&sig=*********************************"
headers = {
# Request headers
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': '*********************************'
}
url = base_url + "/train"
body = {"source": source}
try:
resp = http_post(url = url, json = body, headers = headers)
print("Response status code: %d" % resp.status_code)
print("Response body: %s" % resp.json())
except Exception as e:
print(str(e))