I have the following use case for building a Data Lake (e.g. in Azure):
My organization deals with companies that go into bankruptcy. Once a company goes bankrupt, it needs to hand over all of their data to us, including structured data (e.g. CSVs) as well as semi-structured and unstructured data (e.g. PDFs, Word documents, images, JSON, .txt files etc.). Having a data lake would help here as the volumes of data can be large and unpredictable and Azure Data Lake seems like a relatively low-cost and scalable storage solution.
However, apart from storing all of that data we also need to give business users a tool that will enable them to search through all of that data. I can imagine two search types:
- searching for specific files (using file names or part of file names as the search criteria)
- searching through all text files (word documents, .txt and PDFs) and identifying those files that meet the search criteria (e.g. a specific phrase being searched for)
Are there any out of the box tools that can use Azure Data Lake as a data source that would enable users to perform such searches?