0
votes

I would like to use Azure Data Factory with Azure Data Lake Analytics as action, but without success.

This is my PIPELINE script

{
"name": "UsageStatistivsPipeline",
"properties": {
    "description": "Standardize JSON data into CSV, with friendly column names & consistent output for all event types. Creates one output (standardized) file per day.",
    "activities": [{
            "name": "UsageStatisticsActivity",
            "type": "DataLakeAnalyticsU-SQL",
            "linkedServiceName": {
                "referenceName": "DataLakeAnalytics",
                "type": "LinkedServiceReference"
            },
            "typeProperties": {
                "scriptLinkedService": {
                    "referenceName": "BlobStorage",
                    "type": "LinkedServiceReference"
                },
                "scriptPath": "adla-scripts/usage-statistics-adla-script.json",
                "degreeOfParallelism": 30,
                "priority": 100,
                "parameters": {
                    "sourcefile": "wasb://nameofblob.blob.core.windows.net/$$Text.Format('{0:yyyy}/{0:MM}/{0:dd}/0_647de4764587459ea9e0ce6a73e9ace7_2.json', SliceStart)",
                    "destinationfile": "$$Text.Format('wasb://nameofblob.blob.core.windows.net/{0:yyyy}/{0:MM}/{0:dd}/DailyResult.csv', SliceStart)"
                }
            },
            "inputs": [{
                    "type": "DatasetReference",
                    "referenceName": "DirectionsData"
                }
            ],
            "outputs": [{
                    "type": "DatasetReference",
                    "referenceName": "OutputData"
                }
            ],
            "policy": {
                "timeout": "06:00:00",
                "concurrency": 10,
                "executionPriorityOrder": "NewestFirst"
            }
        }
    ],
    "start": "2018-01-08T00:00:00Z",
    "end": "2017-01-09T00:00:00Z",
    "isPaused": false,
    "pipelineMode": "Scheduled"
}}

I have two parameters variables sourcefile and destinationfile, which are dynamic (path is from Date).

Then I have this ADLA script for execution.

REFERENCE ASSEMBLY master.[Newtonsoft.Json];
REFERENCE ASSEMBLY master.[Microsoft.Analytics.Samples.Formats]; 

USING Microsoft.Analytics.Samples.Formats.Json;

@Data = 
    EXTRACT 
        jsonstring string
    FROM @sourcefile
    USING Extractors.Tsv(quoting:false);


@CreateJSONTuple = 
    SELECT 
        JsonFunctions.JsonTuple(jsonstring) AS EventData 
    FROM 
        @Data;

@records = 
    SELECT
        JsonFunctions.JsonTuple(EventData["records"], "[*].*") AS record
    FROM 
        @CreateJSONTuple;

@properties =
    SELECT 
        JsonFunctions.JsonTuple(record["[0].properties"]) AS prop,
        record["[0].time"] AS time
    FROM 
        @records;

@result =
    SELECT 
        ...
    FROM @properties;


OUTPUT @result
TO @destinationfile
USING Outputters.Csv(outputHeader:false,quoting:true);

Job execution fails and the error is : Error Detail

EDIT:

It seems, that Text.Format is not executed and passed into script like string ... Then in Data Lake Analytics Job detail is this :

DECLARE @sourcefile string = "$$Text.Format('wasb://nameofblob.blob.core.windows.net/{0:yyyy}/{0:MM}/{0:dd}/0_647de4764587459ea9e0ce6a73e9ace7_2.json', SliceStart)";
1

1 Answers

0
votes

In your code sample, the sourcefile parameter is not defined the same way as destinationfile. The latter appears to be correct while the former does not.

The whole string should be wrapped inside $$Text.Format() for both:

"paramName" : "$$Text.Format('...{0:pattern}...', param)"

Also consider passing only the formatted date like so:

"sliceStart": "$$Text.Format('{0:yyyy-MM-dd}', SliceStart)"

and then doing the rest in U-SQL:

DECLARE @sliceStartDate DateTime = DateTime.Parse(@sliceStart);

DECLARE @path string = String.Format("wasb://path/to/file/{0:yyyy}/{0:MM}/{0:dd}/file.csv", @sliceStartDate);

Hope this helps