0
votes

I'm trying to train a Form Recognizer using the browser API console (https://eastus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api/operations/TrainCustomModel/console). I've uploaded traning images to a container and created an SAS. The browser API console generate following HTTP request:

POST https://eastus.api.cognitive.microsoft.com/formrecognizer/v1.0-preview/custom/train?source=https://pythonimages.blob.core.windows.net/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rl&se=2020-01-22T00:23:33Z&st=2020-01-21T16:23:33Z&spr=https&sig=••••••••••••••••••••••••••••••••&prefix=images HTTP/1.1
Host: eastus.api.cognitive.microsoft.com
Content-Type: application/json
Ocp-Apim-Subscription-Key: ••••••••••••••••••••••••••••••••

{
  "source": "string",
  "sourceFilter": {
    "prefix": "string",
    "includeSubFolders": true
  }
}

However, the answer I get back is

Transfer-Encoding: chunked
x-envoy-upstream-service-time: 4
apim-request-id: 5ad37aa2-e251-4b61-98ae-023930b47d27
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
Date: Tue, 21 Jan 2020 16:25:03 GMT
Content-Type: application/json; charset=utf-8

{
  "error": {
    "code": "1004",
    "message": "Dataset path must be relative to local input mount path '/input' if local data is referenced."
  }
}

I don't understand why it seems to be looking for data locally. I've experimented with the SAS, e.g. including the container name (images) in the blob http address rather than as a query parameter, but no success so far.

I've also tried the Python/REST path (described here: https://docs.microsoft.com/en-gb/azure/cognitive-services/form-recognizer/quickstarts/python-train-extract-v1), which results in a different error:

Response status code: 408
Response body: {'error': {'code': '1011', 'innerError': {'requestId': 'e7f9ef9f-97bc-4b6a-86f3-0b29c9591c87'}, 'message': 'The operation exceeded allowed time limit and was canceled. The common reasons are that the data source is too large or contains unsupported content. Please check that your request conforms to service limits and retry with redacted data source.'}}

For completeness, the code I use is as follows (key/signature *ed out:)

########### Python Form Recognizer Train #############
from requests import post as http_post

# Endpoint URL
base_url = r"https://markusformsrecognizer.cognitiveservices.azure.com/" + "/formrecognizer/v1.0-preview/custom"
source = r"https://pythonimages.blob.core.windows.net/images?sv=2019-02-02&ss=bfqt&srt=sco&sp=rl&se=2020-01-22T15:37:26Z&st=2020-01-22T07:37:26Z&spr=https&sig=*********************************"
headers = {
    # Request headers
    'Content-Type': 'application/json',
    'Ocp-Apim-Subscription-Key': '*********************************'
}
url = base_url + "/train" 
body = {"source": source}
try:
    resp = http_post(url = url, json = body, headers = headers)
    print("Response status code: %d" % resp.status_code)
    print("Response body: %s" % resp.json())
except Exception as e:
    print(str(e))
7
For error 408 Please share your training documents to check. - Ram-msft

7 Answers

1
votes

@Markus

Did you get a simple answer to this? My request works for the V1 endpoint but NOT V2. Generating a SAS URI for the 'Source' does not work. I keep getting a 400 Bad Request: Error 1004. "Parameter 'Source' is not a valid path."

FYI: I have been testing with this: https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync/console

and Postman.

0
votes

For error code 1004 Please follow the below to get the Source path containing the training documents and pass as value to the source key.

{
  "source": "string",
  "sourceFilter": {
    "prefix": "string",
    "includeSubFolders": true
  }
}

Replace with the Azure Blob storage container's shared access signature (SAS) URL. To retrieve the SAS URL, open the Microsoft Azure Storage Explorer, right-click your container, and select Get shared access signature. Make sure the Read and List permissions are checked, and click Create. Then copy the value in the URL section. It should have the form: https://.blob.core.windows.net/container name?SAS value.

0
votes

Please use the new Form Recognizer v2.0 release it is an async API and enables training on large data sets and analyzing large documents. https://aka.ms/form-recognizer/api quick start - https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/python-train-extract

0
votes

To get started with Form Recognizer please login to the Azure Portal using this link to create a Form Recognizer resource (for v2.0 (preview) please use West US 2 or West Europe regions).

0
votes

try removing the string value from prefix property.

{ "source": "string", "sourceFilter": { "prefix": "", "includeSubFolders": true } }

0
votes

The Python Quick Start code for version 2.0 seems to be working, at least I don’t get any errors anymore. I’m now feeling slightly silly that I didn’t try this earlier. The API (web-browser) console, linked from the Quick Start page of the Form Recognizer seems automatically assume I want to use version 1.0 and there’s no way to change that (or perhaps I’ve just overseen something). Hence I assumed I’d been allocated a v1.0 trial and therefore that’s what I used when I tried the Python Quick Start the first time around.

0
votes

Instead of using just the SAS URI in the "source" of Request parameter on the API POST call, use the complete string of the container followed by the SAS URI token.

For ex:

https://.blob.core.windows.net//