0
votes

Description of problem

I want to use the attachment processor and remove processor within an array of attachments. I am aware of the fact that the foreach processor is required for this purpose.

This enables the attachment processor and remove processor to be run on the individual elements of the array (https://www.elastic.co/guide/en/elasticsearch/plugins/current/ingest-attachment-with-arrays.html)

I dont find any good NEST(c#) examples for indexing an array of attachments and removing the content field. Can someone provide a NEST(C#) example for my use case?

UPDATE: Thanks to Russ Cam, It's now possible to index an array of attachments and remove base64 encoded file content with following pipeline:

 _client.PutPipeline("attachments", p => p
            .Description("Document attachments pipeline")
            .Processors(pp => pp
                .Foreach<ApplicationDto>(fe => fe
                    .Field(f => f.Attachments)
                    .Processor(fep => fep
                        .Attachment<Attachment>(a => a
                            .Field("_ingest._value._content")
                            .TargetField("_ingest._value.attachment")
                        )
                    )
                ).Foreach<ApplicationDto>(fe => fe
                    .Field(f => f.Attachments)
                    .Processor(fep => fep
                        .Remove<Attachment>(r => r
                            .Field("_ingest._value._content")
                        )
                    )
                )
            )
        );
1

1 Answers

3
votes

Your code is missing the ForeachProcessor; the NEST implementation for this is pretty much a direct translation of the Elasticsearch JSON example. It's a little easier using the Attachment type available in NEST too, which the attachment object that the data is extracted into will deserialize into.

void Main()
{
    var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
    var defaultIndex = "default-index";
    var connectionSettings = new ConnectionSettings(pool)
        .DefaultIndex(defaultIndex);

    var client = new ElasticClient(connectionSettings);

    if (client.IndexExists(defaultIndex).Exists)
        client.DeleteIndex(defaultIndex);

    client.PutPipeline("attachments", p => p
        .Processors(pp => pp
            .Description("Document attachment pipeline")
            .Foreach<Document>(fe => fe
                .Field(f => f.Attachments)
                .Processor(fep => fep
                    .Attachment<Attachment>(a => a
                        .Field("_ingest._value.data")
                        .TargetField("_ingest._value.attachment")
                    )
                )
            )
        )
    );

    var indexResponse = client.Index(new Document
        {
            Attachments = new List<DocumentAttachment>
            {
                new DocumentAttachment { Data = "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo=" },
                new DocumentAttachment { Data = "VGhpcyBpcyBhIHRlc3QK" }
            }
        },
        i => i.Pipeline("attachments")
    );

    var getResponse = client.Get<Document>(indexResponse.Id);
}

public class Document
{
    public List<DocumentAttachment> Attachments { get; set; }
}

public class DocumentAttachment
{
    public string Data { get; set; }

    public Attachment Attachment { get; set; }
}

returns

{
  "_index" : "default-index",
  "_type" : "document",
  "_id" : "AVrOVuC1vjcwkxZzCHYS",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "attachments" : [
      {
        "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo=",
        "attachment" : {
          "content_type" : "text/plain; charset=ISO-8859-1",
          "language" : "en",
          "content" : "this is\njust some text",
          "content_length" : 24
        }
      },
      {
        "data" : "VGhpcyBpcyBhIHRlc3QK",
        "attachment" : {
          "content_type" : "text/plain; charset=ISO-8859-1",
          "language" : "en",
          "content" : "This is a test",
          "content_length" : 16
        }
      }
    ]
  }
}

You can chain the RemoveProcessor on to remove the data field from _source too.