We have a very large number of blobs in Azure that we would like to add to an Azure Search index. These blobs have a variety of formats (PDF, DOC, RTF, etc), but none of them have file extensions.
Because of this, Azure Search balks during indexing as it appears to only use the file extension to do file format detection. We get the following error, and since all of our files have these "invalid" extensions, it would happen regardless of any threshold set for indexing errors:
Import configuration failed, error creating Indexer: "Error with data source: Document 'https://XXXXXXX.blob.core.windows.net/folder/filename.00001' has unsupported content type 'unsupported'. To index only the blob metadata and ignore its content, set the 'dataToExtract' indexer configuration property to 'storageMetadata'. See https://aka.ms/azsearchblobdatatoextract. To ignore this error and continue indexing blobs with unsupported content types, set the 'failOnUnsupportedContentType' switch in indexer configuration to false. For more information, see https://aka.ms/blob-indexer-parameters-for-extraction. Please adjust your data source definition in order to proceed."
Are there any ways to have Azure Search either do file content based file detection, or at least use meta data on the blob?