1
votes

I have a EDM.string field that I use to store Key Value pairs separated by '||' and commas. For example:

{
    "CustomField": "1234|||student, 5678||blue, 999||metallica, etc..."
}

I need to perform a query to extract a key-value combination. For example:

search=5678 blue&searchFields=CustomField&searchMode=all&queryType=full

Using Regular Expression, I was expecting that the following should work:

search=/5678.*blue/&queryType=full&searchMode=all

I am using the default analyzer, so it seems that it's ignoring the '||'. I've tried using Regular Expressions but with no success. Is it possible to query by the key-value pair, without storing it on a EDM.Collection(string) ? I would like to avoid a new reindex process. Thanks in advance.

Update

Using Collections and a new dataset:

{
    "@odata.context": "https://[service].search.windows.net/indexes('[index]')/$metadata#docs",
    "@odata.count": 3,
    "value": [
        {
            "@search.score": 0.45867884,
            "uniqueid": "5",
            "Name": null,
            "Kvp": [
                "1234||sepultura",
                "999||programmer",
                "876||no education"
            ],
            "Kvp2": "1234||sepultura, 999 programmer, 876||no education"
        },
        {
            "@search.score": 0.38223237,
            "uniqueid": "1",
            "Name": null,
            "Kvp": [
                "1234||metallica",
                "999||horse education",
                "876||high school"
            ],
            "Kvp2": "1234||metallica, 999 horse education, 876||high school"
        },
        {
            "@search.score": 0.38223237,
            "uniqueid": "3",
            "Name": null,
            "Kvp": [
                "1234||john mayer",
                "999||kid education",
                "876||university"
            ],
            "Kvp2": "1234||john mayer, 999 kid education, 876||university"
        }
    ]
}

My search query looks like:

Kvp: education&$count=true&queryType=full&searchMode=all

The problem is that I would like to avoid uniqueid 5 to be retrieved. Although it has "education" as a value for one of the tags, it's not the 999 key.

Also tried:

Kvp: 999||education&$count=true&queryType=full&searchMode=all

Kvp: /.*999.*/ AND /.*education.*/&$count=true&queryType=full&searchMode=all

Kvp: /999.*education/&$count=true&queryType=full&searchMode=all
2
I don't see a way to solve your search scenario without updating the index and reindexing the content. I think the key is to create the appropriate tokens that allow for searching a unique key/value pair. This could be accomplished using a custom analyzer. Or just use a Collection...Mr. Kraus
After days trying, I realized I will need to change the index once again. I'm trying with Collections, but many useful functions are not allowed e.g. match, indexof which makes super hard to solve this problem.Thiago Custodio
@Mr.Kraus I've updated the question with another sample data.Thiago Custodio

2 Answers

1
votes

Use a phrase search by surrounding your query with quotes: Kvp:"999||education"

The analyzer does remove the | character, so this is effectively equivalent to Kvp:"999 education". The thing to understand is how analysis works. What you are indexing here, when you index: "1234|||student, 5678||blue, 999||metallica", is six terms:

  • 1234
  • student
  • 5678
  • blue
  • 999
  • metallica

The and query doesn't work because it looks for the matches anywhere in that list, in the field, thus matching id 5. Order or adjacency are not considered as they would be for a phrase query.

The regex query doesn't work because it must match everything within a single term. Kvp:999.*education won't work because "999" and "education" are analyzed into separate terms, so there are no single terms that match that regex.


Another option, by the way, would be to change the analyzer. If you used a whitespace analyzer, for instance, it would change the indexed terms to:

  • 1234||student,
  • 5678||blue,
  • 999||metallica,

Which could be a solution for you, but would make it impossible to search efficiently for just "metallica".

0
votes

I don't believe Regex would be the most efficient way to do this as we really would not be doing a full text search in your case. If you are looking to retrieve key/value combinations, would it make more sense to put all of the various key/values in a searchable collection? That way you could easily just do a search for "5678||blue". Although if you went this way, you really don't need the pipes (||).