1
votes

I am working on copying data from a source Oracle database to a Target SQL data warehouse using the Data factory.

When using the copy function in data factory, we are asked to specify the destination location and a table to copy the data to. There are multiple tables that needs to be copied, and therefore making a table for each in the destination is time consuming.

How can I setup data factory to copy data from the source to a destination, where it will automatically create a table at the destination, without having to explicitly create them manually?

TIA

1

1 Answers

0
votes

Came across the same issue last year, used pipeline.parameters() for dynamic naming and a Data Factory stored procedure activity before the copy activity to first create the empty table from a template before copying https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-stored-procedure.


CREATE PROCEDURE create_sql_table_proc @WindowStartYear NVARCHAR(30), @WindowStartMonth NVARCHAR(30), @WindowStartDay NVARCHAR(30)
AS

BEGIN


declare @strsqlcreatetable as [NVARCHAR](255)
declare @strsqldroptable as [NVARCHAR](255)
declare @tablename as [NVARCHAR](255)
declare @strsqlsetpk as [NVARCHAR](255)

select @tablename = 'TABLE_NAME_' + @WindowStartYear + @WindowStartMonth + @WindowStartDay

select @strsqldroptable = 'DROP TABLE IF EXISTS ' +  @tablename

select @strsqlcreatetable = 'SELECT * INTO ' + @tablename + ' FROM  OUTPUT_TEMPLATE'

select @strsqlsetpk = 'ALTER TABLE ' + @tablename + ' ADD PRIMARY KEY (CustID)'

exec (@strsqldroptable)
exec (@strsqlcreatetable)
exec (@strsqlsetpk)


END

Since have started pushing the table to SQL from our Pyspark scripts running on a cluster, where it is not necessary to first create the empty table https://medium.com/@radek.strnad/tips-for-using-jdbc-in-apache-spark-sql-396ea7b2e3d3.