Timeout when indexing document with custom analyzer

Question

I am creating the mappings for an index I will be using in a project. Given the domain of the features, I'd like most of the fields to be searchable through case-insensitive term queries. I had worked through a custom analyzer (like the one suggested here: Elasticsearch Map case insensitive to not_analyzed documents) but when I try to index a document, the process hangs for 60 seconds until a timeout happens and the whole process fails. I see the same behavior when I test on Sense.

Here is the index definition:

put /emails
{
   "mappings": {
      "email": {
         "properties": {
            "createdOn": {
               "type": "date",
               "store": true,
               "format": "strict_date_optional_time||epoch_millis"
            },
            "data": {
               "type": "object",
               "dynamic": "true"
            },
            "from": {
               "type": "string",
               "store": true
            },
            "id": {
               "type": "string",
               "store": true
            },
            "sentOn": {
               "type": "date",
               "store": true,
               "format": "strict_date_optional_time||epoch_millis"
            },
            "sesId": {
               "type": "string",
               "store": true
            },
            "subject": {
               "type": "string",
               "store": true,
               "analyzer": "standard"
            },
            "templates": {
               "properties": {
                  "html": {
                     "type": "string",
                     "store": true
                  },
                  "plainText": {
                     "type": "string",
                     "store": true
                  }
               }
            },
            "to": {
               "type": "string",
               "store": true
            },
            "type": {
               "type": "string",
               "store": true
            }
         }
      },
      "event": {
         "_parent": {
            "type": "email"
         },
         "properties": {
            "id": {
               "type": "string",
               "store": true
            },
            "origin": {
               "type": "string",
               "store": true
            },
            "time": {
               "type": "date",
               "store": true,
               "format": "strict_date_optional_time||epoch_millis"
            },
            "type": {
               "type": "string",
               "store": true
            },
            "userAgent": {
               "type": "string",
               "store": true
            }
         }
      }
   },
   "settings": {
      "number_of_shards": "5",
      "number_of_replicas": "0",
      "analysis": {
         "analyzer": {
            "default": {
               "tokenizer": "keyword",
               "filter": [
                  "lowercase"
               ],
               "type": "custom"
            }
         }
      }
   }
}

As you can see, I define an analyzer as "default" (if I try to use another name and define it as a default analyzer for each of the two types, I get a "Root mapping definition has unsupported parameters: [analyzer : my_analyzer]" error).

And this is me trying to add a document to the index

post /emails/email/1
{
    "from": "email-address-1",
    "to": "email-address-2",
    "subject": "Hello world",
    "data":{
        "status": "SENT"
    }
}

I really can't understand why this timeout is happening. I also tried using NEST via a C# console application. Same behavior.

Thanks.

PS: for testing I am using both Elasticsearch 2.3 hosted by AWS and Elasticsearch 2.3 hosted in a local docker container.

Do you have enough nodes in the cluster to make having 5 replicas worthwhile at this stage in development? — Russ Cam
It's a go-to-production cluster made temporarily by a single node. At the moment is as empty as creating a new cluster can make it empty. — Kralizek
I think that should be 5 shards and 1 replica. While you're developing, you can set replicas to 0 and then update that before moving to production. — Russ Cam
I noticed the index definition above is not the one giving me problems. I update the question. — Kralizek
Changing your index to 5 shards and 1 replica solves the problem. The issue with 5 replicas and 1 primary shard on one node is that there are not enough active copies to meet the default quorum write consistency of 4 (n/2 +1), since the replicas will all be unassigned on the single node. You'll see a UnavailableShardsException in the logs with an error message for this. — Russ Cam

Russ Cam Russ Cam · Accepted Answer · 2016-12-10T06:47:36

The problem is that you have 1 node and an index with 1 primary shard and 5 replica shards.

Since replicas of a primary will not be assigned on the same node as the primary, the 5 replicas will all be unassigned. This is an issue when indexing a document; by default, the write consistency for an index operation is quorum, and a quorum of 6 (1 primary + 5 replicas) is 4 (n/2 + 1). This means the document needs to have been written to the primary and 3 replicas of the same shard in order to succeed. With unassigned shards, it won't be possible to satisfy this. You'll see a UnavailableShardsException in the logs with an error message for this.

Changing your index to 5 shards and 1 replica will solve the problem.

Timeout when indexing document with custom analyzer

1 Answers