1
votes

There is a table in Azure SQL server and that table has one field call request which is of xml data type. We are reading the table in Azure Data Factory, so when we have created dataset in Azure data factory it is coming as XML but while using that dataset as a source in data flow it is coming as a string.

Can anyone tell us how to parse XML column data in Azure data factory. Thanks in advance.

We want to fetch the information from that XML data column and dump it to another table in Azure SQL server after performing aggregation on it.

Here is the sample data of that table.

Here is the sample data of that table

1
Are the source and sink the same database or different databases on different servers? - wBob
Please try data flow derived column with expressions. - Leon Yue
@wBob Yes, we are using same database as a source and sink. - The Joker

1 Answers

2
votes

Azure Data Factory (ADF) does not have great support for XML but Azure SQL DB does. As your source and target are the same you could create a stored procedure which shreds the XML. Then use ADF's Stored Proc activity to call it if you need to schedule the activity.

Here is a simple example of shredding XML in Azure SQL DB:

------------------------------------------------------------------------------------------------
-- Setup START
------------------------------------------------------------------------------------------------

DROP TABLE IF EXISTS dbo.students 
DROP TABLE IF EXISTS dbo.yourTarget
GO

CREATE TABLE dbo.students (
    studentId       INT NOT NULL PRIMARY KEY,
    studentName     VARCHAR(20) NOT NULL,
    request         XML NOT NULL
    )
GO


CREATE TABLE dbo.yourTarget (
    studentId       INT NOT NULL,
    customerno      INT NOT NULL,
    operation       VARCHAR(20) NOT NULL,
    email           VARCHAR(100) NOT NULL
    )
GO

-- Setup END
------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------
-- Test data START
------------------------------------------------------------------------------------------------

INSERT INTO dbo.students ( studentId, studentName, request )
VALUES 
    ( 1, 'xxx', '<Customers><row><CUSTOMERNO>12</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>[email protected]</EMAIL></row></Customers>' ),
    ( 2, 'yyy', '<Customers><row><CUSTOMERNO>13</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>[email protected]</EMAIL></row></Customers>' ),
    ( 3, 'zzz', '<Customers><row><CUSTOMERNO>14</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>[email protected]</EMAIL></row></Customers>' ),
    ( 4, 'xyz', '<Customers><row><CUSTOMERNO>100</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>[email protected]</EMAIL></row></Customers>' )
GO

-- Test data END
------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------
-- Shred XML START
------------------------------------------------------------------------------------------------

INSERT INTO dbo.yourTarget ( studentId, customerno, operation, email )
SELECT
    s.studentId,
    c.c.value( '(CUSTOMERNO/text())[1]', 'INT' ) customerno,
    c.c.value( '(OPERATION/text())[1]', 'VARCHAR(20)' ) operation,
    c.c.value( '(EMAIL/text())[1]', 'VARCHAR(100)' ) email  
FROM dbo.students s
    CROSS APPLY s.request.nodes('Customers/row') c(c)

-- Shred XML END
------------------------------------------------------------------------------------------------

GO


-- Results
SELECT *
FROM dbo.yourTarget

Wrap up your logic into a stored proc for scheduling. If your databases were on different servers you could still use the T-SQL for shredding the XML in a dataset so you are presenting columns to ADF not XML.

My results:

My results