0
votes

I have a field which has comma separated values for e.g JSON,AngularJS and another as AngularJS,JSON and other having JSON,HTML only.

Now i have been trying to query SOLR using fq=field:(JSONAngularJS*), but it returns only the record with JSON before AngularJS.

How can i query SOLR so that it returns both the records having JSON and AngularJS but not considering the order.

Attaching SOLR Analysis for the field: Analysis for the field

Query formed: http://localhost:8983/solr/my_core/select?fq=field:(JSON%20AND%20AngularJS)&q=:

1

1 Answers

1
votes

Use a field type that is tokenized based on , (i.e. each entry in your list results in a separate token). You can do this by using a SimplifiedRegularExpressionPatternTokenizer:

<fieldType name="text" class="solr.TextField">
  <analyzer>
    <tokenizer class="solr.SimplePatternTokenizerFactory" pattern="[^,]+"/>
  </analyzer>
</fieldType>

Query the index by asking for documents having both tokens present fq=field:(JSON AND AngularJS).

(After update of question)

First - your field seems to be a string field, and not a TextField.

Example definition from UI

To add a field through the API with the correct definition:

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type" : {
     "name":"comma-separated-list",
     "class":"solr.TextField",
     "positionIncrementGap":"100",
     "analyzer" : {
        "tokenizer":{
           "class":"solr.SimplePatternTokenizerFactory", "pattern": "[^,]+" },
        }
     }
  }
}' http://localhost:8983/solr/collectionname/schema

After adding a set of example documents:

[
      {
        "langs":"JSON,AngularJS,Microsoft Visual Basic",
        "id":"foo",
        "address":"None",
        "_version_":1606953238273196032},
      {
        "langs":"JSON,AngularJS",
        "id":"foo2",
        "address":"None",
        "_version_":1606953238277390336},
      {
        "langs":"JSON,Microsoft Visual Basic",
        "id":"foo3",
        "address":"None",
        "_version_":1606953238278438912},
      {
        "langs":"AngularJS,JSON",
        "id":"foo4",
        "address":"None",
        "_version_":1606953238278438913}]

And then querying using fq=langs:(JSON AND AngularJS)&q=*:*):

  {
    "langs":"JSON,AngularJS,Microsoft Visual Basic",
    "id":"foo",
    "address":"None",
    "_version_":1606953238273196032},
  {
    "langs":"JSON,AngularJS",
    "id":"foo2",
    "address":"None",
    "_version_":1606953238277390336},
  {
    "langs":"AngularJS,JSON",
    "id":"foo4",
    "address":"None",
    "_version_":1606953238278438913}]

The document that didn't have AngularJS defined has been left out.