1
votes

I have CSV files in a container in Data Lake and I am trying to transform the attribute column(which is in json) in the files into multiple columnns. When I try using Data flows to do that, It recognizes the attribute column as string and I am unable to change it to array so that I can either flatten or create derived columns out of it. Am i doing something wrong here CSV File Format:

ID Name Attribute
123 Test {"Referrer URL":null,"Query Parameters":"topics"}
456 Test2 {"Referrer URL":"www.google.com","Query Parameters":"WebTopics"}

After following guidance of Mark and Joseph: Issues faced while using parsing function and removing spaces from keys in json objects in those columns

  1. wouldn't the replace function also remove spaces from the values associated with those keys. That could be an issue if we get sentence type data in those fields.
  2. I am still running into issues when I follow all the steps mentioned by Joseph. It may be associated with how I set up the dataset. If I use data preview to inspect the data, I see the the source does not properly fill up the columns. In the example, it just stops after Refferer URL being null and does not show the other values as well as the closing brackets of the json object in the column. adding pictures of datalake connection setup.

after following all steps

Connection settings enter image description here Actual values in Excel

2
Hi@Hammad Hassan Kandly let me know if you need more information.Joseph Xu
Hi @josephXu thanks for your help. I am still running into issues that I have added in my original question.Hammad Hassan Khan
Hi@Hammad Hassan Khan, I've updated my answer.Joseph Xu

2 Answers

2
votes

Because your JSON string is actually a string field inside of a delimited text file, you first need to parse it using the Parse transformation. Now that "Attribute" is a JSON data type, you can then work with the structures and arrays inside of it.

https://docs.microsoft.com/en-us/azure/data-factory/data-flow-parse

Choose "Attribute" as your Expression property and give a name for the Column property. ADF will create a new hierarchical column based on that new name with the properties being the columns that you identify in the output property.

2
votes

Update:
Hi@Hammad Hassan Khan, due to your Attribute column contains "," character, so I edit it as follows. And use Pipe (|) as Column delimiter in the source:
enter image description here


Attached to the answer given by @Mark Kromer MSFT. Yes we can use Parse transformation in mapping data flow to achieve that. But the Parse activity does not support JSON objects whose keys contain space characters. Therefore, we need to replace space characters.

I created a simple test for this. And the result is as follows:
enter image description here

  1. The data source is as follows:
    enter image description here

  2. In DerivedColumn1 activity. Select Attribute column, enter expression replace(Attribute,' ','' ) to replace spaces in keys. Because in the next Parse1 activity, it does not support keys contain space characters.
    enter image description here The data preview is as follows:
    enter image description here

  3. In Parse1 activity, select Attribute column, enter Attribute as Expression, enter (ReferrerURL as string,QueryParameters as string) as Output column type. enter image description here
    The data preview is as follows:
    enter image description here