3
votes

I am using ElasticSearch.NET and NEST to service calls to a .NET webservice. The ElasticClient is a singleton so that its connection pooling and failover should be persisted between service calls.

The ElasticClient is using a SniffingConnectionPool with two clustered nodes. One of the two nodes is dead and not responding to http traffic at all.

My understanding is that the SniffingConnectionPool should detect this and exclude the dead connection from the nodes that it uses - that is, all Elasticsearch traffic should be sent to the sole live node.

Sadly, the ElasticClient persists in attempting to use the dead node, adding long timeout delays to the webservice response. Has anybody had any success using the Elasticsearch.NET failover pooling? Does anyone have any ideas what I am doing wrong, or what else I could try?

I have already tried switching to a StaticConnectionPool, but the problem persisits.

I have tried futzing with the connection settings, but can't really get a useful enough understanding from the doco at http://nest.azurewebsites.net/elasticsearch-net/cluster-failover.html.

Here's the code that I use to create the client with its connection pool:

    private static IElasticClient _MakeClient(string[] injectedUris, string defaultIndex)
    {
        var settings = new ConnectionSettings(_GetIConnectionPool(injectedUris), defaultIndex);
        var connection = new ConnectionWithBackoffStrategy(
            new HttpConnection(settings));
        var client = new ElasticClient(settings, connection);
        return client;
    }

    private static IConnectionPool _GetIConnectionPool(string[] injectedUris)
    {
        var uris = injectedUris ?? ConfigurationManager.AppSettings["elasticSearchUrls"].Split(',');
        return new SniffingConnectionPool(uris.Select(uri => new Uri(uri)));
    }

I'm using Nest 1.0.0-beta1.

1
Yikes, this part is heavily tested and I've also ported real world applications and switched off nodes to confirm it works as expected. Are you able to capture Fiddler traffic and send me the captured requests? Would love to get to the bottom here. - Martijn Laarman

1 Answers

1
votes

ConnectionSettings has methods to opt-in to connection sniffing, such as SniffOnConnectionFault(true) and SniffOnStartup(true).

If you opt-in to these then ElasticSearch.net will call http://localhost:9200/_nodes/_all/clear?timeout=50

I believe this is to prove that the node is alive, but there is no information about the clear action in the elastic search docs.

The python implementation appears to imply its just a call to something that will return quickly; although its a bit 'scary' seeing clear in a request to 'all nodes'

use small timeout for the sniffing request, should be a fast api call

https://github.com/elasticsearch/elasticsearch-py/blob/master/elasticsearch/transport.py#L175