Azure Search Index PDF document landscape text

Question

I have a collection of PDF documents in blob storage which I have added as a data source to my Azure Search instance. When I index these documents, any text which is rotated (i.e. landscape formatted) is not indexed. These rotated pages are NOT images, but text. If I rotate the text and regenerate the PDF, I can search on the rotated text.

Is this behavior by design? Is there a way to get the rotated text to be searchable?

One other oddity - the original PDF is v1.3 and when I regenerated it (in Docuprinter) it generates as v1.4. With this version I can search the rotated and non-rotated text.

Thanks!

Luis Cabrera Luis Cabrera · Accepted Answer · 2018-12-11T21:38:51

The behavior is not by design, it's an issue we need to resolve as part of the document cracking stage. If you want to track resolution of issues like this, please create a UserVoice request. https://feedback.azure.com/forums/263029-azure-search

thanks! Luis Cabrera - Principal PM (Azure Search)

Azure Search Index PDF document landscape text

1 Answers