I'm trying to import a rather large (~200M docs) documentdb into Azure Search, but I'm finding the indexer times out after ~24hrs. When the indexer restarts, it starts again from the beginning, rather than from where it got to, meaning I can't get more than ~40M docs into the search index. The data source has a highwater mark set like this:
var source = new DataSource();
source.Name = DataSourceName;
source.Type = DataSourceType.DocumentDb;
source.Credentials = new DataSourceCredentials(myEnvDef.ConnectionString);
source.Container = new DataContainer(myEnvDef.CollectionName, QueryString);
source.DataChangeDetectionPolicy = new HighWaterMarkChangeDetectionPolicy("_ts");
serviceClient.DataSources.Create(source);
The highwater mark appears to work correctly when testing on a small db.
Should the highwater mark be respected when the indexer fails like this, and if not how can I index such a large data set?