0
votes

I am using Microsoft Azure Data Factory .NET SDK in order to automate dataset creation for a large number of tables.

A method within my .NET console application provides me the ability to create input and output datasets, based on a specified table name:

createInputDataSet(string table_Name, DataFactoryManagementClient client) {
    client.Datasets.CreateOrUpdate(resourceGroupName, dataFactoryName,
        new DatasetCreateOrUpdateParameters()
        {
            Dataset = new Dataset()
            {
                Properties = new DatasetProperties()
                {
                    Structure = new List<DataElement>()
                    {
                        //TODO: Autogenerate columns and types
                        new DataElement() {Name = "name", Type = "String" },
                        new DataElement() {Name = "date", Type = "Datetime" }
                    }
            }...

Currently, dataset creation is accomplished through a stored procedure on either source SQL Server or target SQL Data Warehouse. The stored procedure specifies a table name and then looks into INFORMATION_SCHEMA in order to generate valid columns and types for each ADF dataset. We then manually copy the result into portal.azure.com.

We have over 600 datasets, so need to utilize the .NET SDK for automated copy to ADF.

How does one create datasets automatically, while taking into account that each dataset's structure (i.e. columns and types) will differ?

1

1 Answers

0
votes

The only way I've been able to accomplish this is by writing a stored procedure to generate column names and types on both source and target. Such stored procedure should call INFORMATION_SCHEMA and INFORMATION_SCHEMA.COLUMNS in order to generate each column and type for the inputted table.

Once the procedure adequately outputs two columns (name, type) programmatically call the procedure and save as follow:

List<DataElement> InputParams = new List<DataElement>();

SqlConnection connect = new SqlConnection(<connection_string>);
SqlCommand cmd = new SqlCommand("pUtil_GenDFAutomate", connect);

cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(new SqlParameter("@TableName", <table_name>));
using (var reader = cmd.ExecuteReader())
{
    if (reader.HasRows)
    {
        while (reader.Read())
        {
            var name = reader.GetString(0);
            var type = reader.GetString(1);

            InputParams.Add(new DataElement
            {
                Name = name,
                Type = type
            });
        }
        reader.Close();
    }
}

Then, upon creation of your input/output dataset, simply use the variable InputParams as follow:

new DatasetCreateOrUpdateParameters()
{
    Dataset = new Dataset()
    {
        Properties = new DatasetProperties()
        {
            Structure = InputParams 
//Etc.