Elasticsearch/Nest performance issue

Question

I've noticed a strange thing about the behaviour of ISearchResponse.HitsMetadata.Total property in NEST library. Whenever I delete a document async and want to immediately retrieve the remaining documents from Elasticsearch, the HitsMetadata.Total field, which is available on ISearchResponse object, almost never gets updated correctly. It ususally indicates the total number of documents at the moment preceding the deletion operation. The behaviour gets back to normal when I pause the execution of a request for at least 700 milliseconds as if NEST (or maybe Elasticsearch itself) needed more time to update the property's state. I'm new to using NEST and Elasticsearch so it is posssible that I'm doing something wrong here or I might not fully understand the workings of the library, yet I have spent quite a lot of time on the problem and can't get around it. As a result the pagination metadata I send to the client get computed wrongly. I'm using NEST 6.6.0 and Elasticsearch 6.6.2.

The DELETE action:

[HttpDelete("errors/{index}/{logeventId}")]
public async Task<IActionResult> DeleteErrorLog([FromRoute] string index, [FromRoute] string logeventId)
{
    if (string.IsNullOrEmpty(index))
    {
        return BadRequest();
    }

    if (string.IsNullOrEmpty(logeventId)) 
    {
        return BadRequest();
    }

    var getResponse = await _client.GetAsync<Logevent>(new GetRequest(index, typeof(Logevent), logeventId));

    if(!getResponse.Found)
    {
        return NotFound();
    }

    var deleteResponse = await _client.DeleteAsync(new DeleteRequest(index, typeof(Logevent), logeventId));

    if (!deleteResponse.IsValid)
    {
        throw new Exception($"Deleting document id {logeventId} failed");
    }

    return NoContent();

}

The GET action:

[HttpGet("errors/{index}", Name = "GetErrors")]
public async Task<IActionResult> GetErrorLogs([FromRoute] string index, 
    [FromQuery]int pageNumber = 1, [FromQuery] int pageSize = 5)
{
    if (string.IsNullOrEmpty(index))
    {
        return BadRequest();
    }

    if(pageSize > MAX_PAGE_SIZE || pageSize < 1)
    {
        pageSize = 5;
    }

    if(pageNumber < 1)
    {
        pageNumber = 1;
    }

    var from = (pageNumber - 1) * pageSize;

    ISearchResponse<Logevent> searchResponse = await GetSearchResponse(index, from, pageSize);

    if (searchResponse.Hits.Count == 0)
    {
        return NotFound();
    }

    int totalPages = GetTotalPages(searchResponse, pageSize);

    var previousPageLink = pageNumber > 1 ? 
        CreateGetLogsForIndexResourceUri(ResourceUriType.PreviousPage, pageNumber, pageSize, "GetErrors") : null;

    var nextPageLink = pageNumber < totalPages ? 
        CreateGetLogsForIndexResourceUri(ResourceUriType.NextPage, pageNumber, pageSize, "GetErrors") : null;

    /* HERE, WHEN EXECUTED IMMMEDIATELY (UP TO 700 MILISSECONDS, THE 
       totalCount FIELD GETS MISCALCULATED AS IT RETURNS THE VALUE PRECEDING 
       THE DELETION OF A DOCUMENT 
    */
    var totalCount = searchResponse.HitsMetadata.Total;
    var count = searchResponse.Hits.Count;

    var paginationMetadata = new
    {
        totalCount = searchResponse.HitsMetadata.Total,
        totalPages,
        pageSize,
        currentPage = pageNumber,
        previousPageLink,
        nextPageLink
    };

    Response.Headers.Add("X-Pagination", Newtonsoft.Json.JsonConvert.SerializeObject(paginationMetadata));

    var logeventsDtos = Mapper.Map<IEnumerable<LogeventDto>>(searchResponse.Hits);

    return Ok(logeventsDtos);
}

The GetSearchResponseMethod:

private async Task<ISearchResponse<Logevent>> GetSearchResponse(string index, int from, int pageSize)
{
    return await _client.SearchAsync<Logevent>(s =>
             s.Index(index).From(from).Size(pageSize).Query(q => q.MatchAll()));

}

The code on the client side initiating the server-side actions:

async deleteLogevent(item){
    this.deleteDialog = false;
    let logeventId = item.logeventId;
    let level = this.defaultSelected.name;
    let index = 'logstash'.concat('-', this.defaultSelected.value, '-', this.date);

    LogsService.deleteLogevent(level, index, logeventId).then(response => {
      if(response.status == 204){
        let logeventIndex = this.logs.findIndex(element => {return element.logeventId === item.logeventId});
        this.logs.splice(logeventIndex, 1);
        LogsService.getLogs(level, index, this.pageNumber).then(reloadResponse => {
          this.logs.splice(0);
          reloadResponse.data.forEach(element => {
          this.logs.push(element)
          });
          this.setPaginationMetadata(reloadResponse.headers["x-pagination"]);
        })
      }
    }).catch(error => {

    })

Russ Cam Russ Cam · Accepted Answer · 2019-04-04T09:23:27

This is normal and expected behaviour for Elasticsearch. Changes from operations like indexing, updating and deleting are not reflected in responses to search requests until a refresh interval has occurred. Mike McCandless' blog post on how Lucene handles deleted documents is a few years old now but still relevant. The online definitive guide for Elasticsearch's section on Near Real-Time Search is also a good resource.

Here's an example that demonstrates the behaviour

private static void Main()
{
    var defaultIndex = "refresh_example";
    var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));

    var settings = new ConnectionSettings(pool)
        .DefaultIndex(defaultIndex)
        .DefaultTypeName("_doc");

    var client = new ElasticClient(settings);

    if (client.IndexExists(defaultIndex).Exists)
        client.DeleteIndex(defaultIndex);

    client.CreateIndex(defaultIndex, c => c
        .Mappings(m => m
            .Map<Document>(mm => mm
                .AutoMap()
            )
        )
    );

    var indexResponse = client.IndexDocument(new Document 
    {
        Id = 1,
        Name = "foo"
    });

    // hit count is likely to be 0 here because no refresh interval has occurred
    var searchResponse = client.Search<Document>();
    Console.WriteLine($"search hit count after index no refresh: {searchResponse.Hits.Count}");

    // a get for the exact document will return it however.
    var getResponse = client.Get<Document>(1);
    Console.WriteLine($"get document with id 1, name is: {getResponse.Source.Name}");

    // use refresh API to refresh the index
    var refreshResponse = client.Refresh(defaultIndex);

    // now the hit count is 1
    searchResponse = client.Search<Document>();
    Console.WriteLine($"search hit count after refresh: {searchResponse.Hits.Count}");

    // index another document, and refresh at the same time
    indexResponse = client.Index(new Document
    {
        Id = 2,
        Name = "bar"
    }, i => i.Refresh(Refresh.WaitFor));

    // now the hit count is 2
    searchResponse = client.Search<Document>();
    Console.WriteLine($"search hit count after index with refresh: {searchResponse.Hits.Count}");

    // now delete document with id 1
    var deleteResponse = client.Delete<Document>(1);
    Console.WriteLine($"document with id 1 deleted");

    // hit count is still 2
    searchResponse = client.Search<Document>();
    Console.WriteLine($"search hit count before refresh: {searchResponse.Hits.Count}");

    // refresh
    refreshResponse = client.Refresh(defaultIndex);

    // hit count is 1
    searchResponse = client.Search<Document>();
    Console.WriteLine($"search hit count after refresh: {searchResponse.Hits.Count}");
}

public class Document 
{
    public int Id { get; set; }

    public string Name { get;set; }
}

Here's what's written to the console

search hit count after index no refresh: 0
get document with id 1, name is: foo
search hit count after refresh: 1
search hit count after index with refresh: 2
document with id 1 deleted
search hit count before refresh: 2
search hit count after refresh: 1

You might be thinking, "why don't I just refresh on every operation?". The reason not to is performance; when you call the refresh API or specify refresh as part of an operation, a new segment is written and opened, which uses system resources, needs to be committed to disk and likely later merged with other segments. Calling refresh constantly is going to create a lot of segments. It is useful however to call refresh in tests to make assertions.

It's best to write your application to handle this near real time nature with deleting and search. For pagination, this is a similar scenario for other datastores too.

Elasticsearch/Nest performance issue

1 Answers