3
votes

I started recently using Elastic Search, with its .net client NEST. A lot of questions to ask.

I'm currently blocked while trying to highlight search results in attachment field, with elasticsearch-mapper-attachments plugin. The indexation works well, the mapping seems correct, the encode and decode work well too,

Once I tried to search by keyword, ES seems to be able to find the right documents which contain the keyword, but in the highlight result, instead of showing the decoded text, it shows nothing, or the encoded one.

Read from another post treating some same features, the solution is to set store = yes, and TermVector = TermVectorOption.WithPositionsOffsets.

So I tried to configure it in my C# class file with

[ElasticProperty(Name = "attach", Type = FieldType.Attachment, Store=true, TermVector = TermVectorOption.WithPositionsOffsets)] 
public string attach { get; set; } 

and the query is the follwing (however no highlight result is given back)

{ 
"fields" : ["name","attach"], 
  "query" : { 
    "query_string" : { 
      "query" : "settings" 
    } 
  }, 
  "highlight" : { 
    "fields" : { 
      "attach" : {} 
    } 
  } 
} 

Seems while creating mapping for a type from a class, the attachment attribute was not set correctly : since while checking with localhost:9200/myindex/mytype/_mapping?pretty the attachment attribute has no Store=true, TermVector = TermVectorOption.WithPositionsOffsets for it.

Do you have some idea please? Thanks

1
A reply has been given in Github of Nest github.com/elasticsearch/elasticsearch-net/issues/972Yuan

1 Answers

1
votes

I wasn't able to get this to work solely with the response to the GitHub issue, although it did set me on the right direction. After some trial and error, here's what I came up with:

The Doc class

public class Doc
{
    public string File { get; set; }
    // As an example for including additional fields:
    public string Title { get; set; } 
}

The attachment will automatically be created with all of the internal fields, so you don't necessarily need to create another class for the attachment. I think it would be possible to do something similar to the accepted answer, but you would have to manually add all of the properties when the index is created.

Index creation and storage of pdf file

var index = "my-application";
var node = new Uri("http://localhost:9200");
var settings = new ConnectionSettings(node, defaultIndex: index);
var client = new ElasticClient(settings);

// Create the index, indicating that the contents of the internal "file" field 
// and the internal "title" field should be stored with position offsets to 
// allow highlighting.
client.CreateIndex(_index, c => c
    .AddMapping<Doc>(m => 
        m.Properties(ps => 
            ps.Attachment(a =>
                a.Name(o => o.File)
                    .FileField(t => t.Name("file")
                    .TermVector(TermVectorOption.WithPositionsOffsets)
                    .Store()
                ).TitleField(t => t                  
                 .Name("title")
                 .TermVector(TermVectorOption.WithPositionsOffsets)
                 .Store())
             )
        ).Properties(ps =>
            ps.String(s => 
                s.Name(o => o.Title)
            )
        )
    )
);

string path = @"path\to\sample1.pdf";

var doc = new Doc()
{
    Title = "Anything you want",
    File = Convert.ToBase64String(System.IO.File.ReadAllBytes(path))
};

client.Index(doc);

Search

var queryString = "something in your pdf";
var searchResults = _client.Search<Doc>(s => 
           s.Fields("file", "title")
            .Query(quer => quer.QueryString(x => x.Query(queryString)))
            .Highlight(x => 
                    x.OnFields(y => 
                        y.OnField(f => f.File)
                         .PreTags("<strong>")
                         .PostTags("</strong>")
            )
       )
   );