Update data in Azure Data Lake

Question

I'm new to Azure Data Lake and big data in general and I apologize if my question seems stupid.

I've been looking at ADL and ADLA to develop a cold path data storage. I've an Azure Stream Analytics query that outputs to Power Bi for real time visualizations and another query which stores data in .CSV format in the data lake.

I've created a VS project where I've created a database, schema and tables corresponding to the csv files and one script extracts the data from the CSV file and copies it in the table to give my data some structure.

My question is that if the data keeps on storing in the csv files where the folder structure defines when the data has arrived, how do I update my tables with the new data. Do I drop the table and begin again, this is not a viable solution I believe?

I've scripts which I've to run in order to create a DB, schema, extracting the data and populate the tables. Surely, I can't run all the scripts whenever new data arrives.

Note: I want to point out that the databases and tables are within ADLA U-SQL Databases.

Miguel Domingues Miguel Domingues · Accepted Answer · 2018-08-27T14:02:11

That is a very subjective matter. Before you proceed I recommend you to read a lot about "Big Data" and "Data Lake". In the middle of that lecture you may find the answers. For instance, see the tree organization of Data Lake. My starting references were:

http://blogs.adatis.co.uk/ustoldfield/post/Shaping-The-Lake-Data-Lake-Framework

https://www.sqlchick.com/entries/2016/7/31/data-lake-use-cases-and-planning

https://www.sqlchick.com/entries/2017/12/30/zones-in-a-data-lake

https://static1.squarespace.com/static/52d1b75de4b0ed895b7e7de9/t/59e3bd8464b05fe9e6bbe969/1508097416856/DesigningAModernDWandDataLake_MelissaCoates.pdf

https://www.gartner.com/binaries/content/assets/events/keywords/catalyst/catus8/2017_planning_guide_for_data_analytics.pdf

Update data in Azure Data Lake

1 Answers