I have a requirement to process some big data and planning to deploy Databricks cluster & a storage technology. Currently evaluating Data Lake Gen2 which supports both object and file storage. The storage account (blob, file, table, queue) also has similar capabilities which can handle both file based and object based storage requirements. I am bit puzzled to go for an option because of these similarities. Can someone clarify the following questions please?
- Except HDFS support, what else is a significant feature that I should use Data Lake Gen2 against Storage Account?
- Storage Account v2 with Hierarchical namespace enabled == Data Lake Gen2. If so, can I use File System to create file shares and mount them in my VM as like Storage acc's File system?
- For accessing data from Databricks, which one of these two will be better for big data workloads. I can see Storage account can also be mounted as DBFS which can still leverage the distributed processing.