0
votes

I have a requirement to parse a lot of small files and load them into a database in a flattened structure. I prefer to use ADF V2 and SQL Database to accomplish it. The file parsing logic is already available using Python script and I wanted to orchestrate it in ADF. I could see an option of using Python Notebook connector to Azure Databricks in ADF v2. May I ask if I will be able to just run a plain Python script in Azure Databricks through ADF? If I do so, will I just run the script in Databricks cluster's driver only and might not utilize the cluster's full capacity. I am also thinking of calling Azure functions as well. Please advise which one is more appropriate in this case.

1
hi,does my answer helps you>Jay Gong

1 Answers

1
votes

Just provide some ideas for your reference.

Firstly, you are talking about Notebook and Databricks which means ADF's own copy activity and Data Flow can't meet your needs, since as i know, ADF could meet just simple flatten feature! If you miss that,please try that first.

Secondly,if you do have more requirements beyond ADF features, why not just leave it?Because Notebook and Databricks don't have to be used with ADF,why you want to pay more cost then? For Notebook, you have to install packages by yourself,such as pysql or pyodbc. For Azure Databricks,you could mount azure blob storage and access those files as File System.In addition,i suppose you don't need many workers for cluster,so just configure it as 2 for max.

enter image description here

Databricks is more suitable for managing as a job i think.

Azure Function also could be an option.You could create a blob trigger and load the files into one container. Surely,you have to learn the basic of azure function if you are not familiar with it.However,Azure Function could be more economical.