4
votes

Cannot find an answer via google, msdn (and other microsoft) documentation, or SO.

In Azure Data Factory you can get data from a dataset by using copy activity in a pipeline. The pipeline definition includes a query. All the queries I have seen in documentation are simple, single table queries with no joins. In this case, a dataset is defined as a table in the database with "TableName"= "mytable". Additionally, one could retrieve data from a stored procedure, presumably allowing more complex sql.

Is there a way to define a more complex query in a pipeline that includes joins and/or transformation logic that alters the data from or pipeline from a query rather than stored procedure. I know that you can specify fields in a dataset, but don't know how to get around the "tablename" property.

If there is a way, what would that method be?

input is on-premises sql server. output is azure sql database.

UPDATED for clarity.

2

2 Answers

8
votes

Yes, the sqlReaderQuery can be much more complex than what is provided in the examples, and it doesn't have to only use the Table Name in the Dataset.

In one of my pipelines, I have a Dataset with the TableName "dbo.tbl_Build", but my sqlReaderQuery looks at several tables in that database. Here's a heavily truncated example:

with BuildErrorNodes as (select infoNode.BuildId, ...) as MessageValue from dbo.tbl_BuildInformation2 as infoNode inner join dbo.tbl_BuildInformationType as infoType on (infoNode.PartitionId = infoType), BuildInfo as ...

It's a bit confusing to list a single table name in the Dataset, then use multiple tables in the query, but it works just fine.

0
votes

There's a way to move data from on-premise SQL to Azure SQL using Data Factory. You can use Copy Activity, check this code sample for your case specifically GitHub link to the ADF Activity source. Basically you need create Copy Activity which will have TypeProperties with SqlSource and SqlSink sets look like this:

 <!-- language: lang-json -->
"typeProperties": {
      "source": {
        "type": "SqlSource",
        "SqlReaderQuery": "select * from [Source]"
      },
      "sink": {
        "type": "SqlSink",
        "WriteBatchSize": 1000000,
        "WriteBatchTimeout": "00:05:00"
      }
    },

Also do mention - you can use not only selects from tables or views, but also [Table-Valued-Functions] will work as well.