0
votes

In my index, I've a field called id. During my enrichment pipeline I compute a value called /document/documentId, which I'm attempting to map to the id field. However, this mapping does not seem to work as the id always seems to be some long value that looks like a hash. All my other output field mappings work as expected.

Portion of the Index:

{
    'name': 'id',
    'type': 'Edm.String',
    'facetable': false,
    'filterable': true,
    'key': true,
    'retrievable': true,
    'searchable': true,
    'sortable': true,
    'analyzer': null,
    'indexAnalyzer': null,
    'searchAnalyzer': null,
    'synonymMaps': [],
    'fields': []
}

Portion of the Indexer:

'outputFieldMappings': [
    {
        'sourceFieldName': '/document/documentId',
        'targetFieldName': 'id'
    }
]

Expected Value: 4b160942-050f-42b3-bbbb-f4531eb4ad7c

Actual Value: aHR0cHM6Ly9zdGRvY3VtZW50c2Rldi5ibG9iLmNvcmUud2luZG93cy5uZXQvMDNiZTBmMzEtNGMyZC00NDRjLTkzOTQtODJkZDY2YTc4MjNmL29yaWdpbmFscy80YjE2MDk0Mi0wNTBmLTQyYjMtYmJiYi1mNDUzMWViNGFkN2MucGRm0

Any thoughts on how to fix this would be much appreciated!

2

2 Answers

1
votes

TL;DR - Can't use output field mappings for Keys. Can only use source fields.

According to Microsoft, it's not possible to set the document key using the output field mapping. Apparently, there is an issue in cases of deleting documents so the key has to exist straight out of the document.

I ended up using a mapping function in the fieldMappings.

 "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_name",
      "targetFieldName": "filename"
    },
    {
      "sourceFieldName": "metadata_storage_name",
      "targetFieldName": "id",
      "mappingFunction": {
        "name": "extractTokenAtPosition",
        "parameters": {
          "delimiter": ".",
          "position": 0
        }
      }
    }
  ]

Since my file name is something like 4b160942-050f-42b3-bbbb-f4531eb4ad7c.pdf then this ends up mapping mapping correctly to my Id.

0
votes

You can use a regular field mapping rather than an output field mapping. If you created your indexer in the Azure portal, your key (which is "id", since key is true in your index definition of "id" above) was probably base64-encoded (that option is checked by default). You will need to base64-decode it to get your original value, OR you can store a second copy of the original value without encoding it (the key will need to be encoded). Here's how you do the latter - this can replace your output field mapping:

"fieldMappings": [
  {
    "sourceFieldName": "documentId",
    "targetFieldName": "documentId"
  },
  {
    "sourceFieldName": "documentId",
    "targetFieldName": "id",
    "mappingFunction": {
       "name": "base64Encode"
    }
  }
]

Note that you will also need to add a documentId field in your index since you are storing this in its original format as well.

{
'name': 'documentId',
'type': 'Edm.String',
'facetable': false,
'filterable': true,
'key': false,
'retrievable': true,
'searchable': true,
'sortable': true,
'analyzer': null,
'indexAnalyzer': null,
'searchAnalyzer': null,
'synonymMaps': [],
'fields': []

}

Alternatively, you could just base64 encode (when storing) and decode (when retrieving) the id value. This key value is base64-encoded so it's safe to use as an Azure Cognitive Search document key. Check out https://docs.microsoft.com/azure/search/search-indexer-field-mappings for more info.