0
votes

blob storage have blobs like new/1.json, new/2.json

I have a index called new-index, indexer called new-indexer and datasource called new-datasource my datasouce body is like this:

{
    "name" : "new-datasource",
    "type" : "azureblob",
    "credentials" : { "connectionString" : "MyStorageConnStrning" },
    "container" : { "name" : "mycontaner", "query" : "new" }
}  

"query" : "new" means while running the indexer it will take all the blobs from Virtual Directory new from blob storage.

Indexer run has a start time and an end time. And I know that indexer does incremental indexing based on lastModified property of the blob(doc).

Question is, between the start time and the end time of indexer run if a new blob is created like new/3.json in Virtual Directory new, will this blob also get indexed by this indexer run or does another run needs to occur for it to get indexed.

2

2 Answers

1
votes

Question is, between the start time and the end time of indexer run if a new blob is created like new/3.json in Virtual Directory new, will this blob also get indexed by this indexer run or does another run needs to occur for it to get indexed.

The answer is a little more involved than what Joey said. Since the indexer indexes the blobs by enumerating them in pages, a new blob even with an updated timestamp may or may not be picked up by the indexer depending upon the page it gets placed in.

The only guarantees the indexer provides are -

  • Indexer will index all blobs with a LastModified timestamp before indexer start time for sure in the same run.
  • Incremental changes will eventually be indexed because of data change detection policy. This means they may or may not be indexed in the same run.

It's not advisable to make any assumptions past the high watermark and when the new blob gets indexed is technically undefined behavior.

Checkout this article for more details. I hope this helps.

0
votes

will this blob also get indexed by this indexer run or does another run needs to occur for it to get indexed.

In short, yes. It will get indexed by this indexer because of dataChangeDetectionPolicy.

When using Azure Blob data sources, Azure Search automatically uses a high watermark change detection policy based on a blob's last-modified timestamp. With high watermark, you can use it for incremental change detection by picking up just those rows containing new or revised content.

For more details, you could refer to this article.