0
votes

I have a CSV file copied from Azure blob to Azure data lake store. The pipe line is established successfully and file copied.

I'm trying to write USQL sample script from here:

Home -> datalakeanalysis1->Sample scripts-> New job

Its showing me default script.

//Define schema of file, must map all columns
@searchlog = 
    EXTRACT UserId          int, 
        Start           DateTime, 
        Region          string, 
        Query           string, 
        Duration        int, 
        Urls            string, 
        ClickedUrls     string
FROM @"/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();

OUTPUT @searchlog 
TO @"/Samples/Output/SearchLog_output.tsv"
USING Outputters.Tsv();

Note: my file in data lake store is here:

Home->dls1->Data explorer->rdl1

How can I give the path of my CSV file in the script ( my CSV file is stored in Data Lake Store).

Also, I would like to keep my destination file(output) in Data lake store.

How can I modify my script to refer to the data lake store path?

Edit:

I have changed my script as below:

//Define schema of file, must map all columns
@searchlog = 
EXTRACT ID1          int, 
        ID2           int, 
        Date          DateTime, 
        Rs          string, 
        Rs1        string, 
        Number            string, 
        Direction     string,
        ID3            int
FROM @"adl://rdl1.azuredatalakestore.net/blob1/vehicle1_09142014_JR.csv"
USING Extractors.Csv();

OUTPUT @searchlog 
TO @"adl://rdl1.azuredatalakestore.net/blob1/vehicle1_09142014_JR1.csv"
USING Outputters.Csv();

However, my job is getting failed with attached error:

enter image description here

Moreover, I'm attaching the CSV file that I wanted to be used in the job. Sample CSV file

Is there anything wrong in the CSV file ? Or in my script?? Please help. Thanks.

2
Hi,any updates now?Does my answer helps you?Jay Gong

2 Answers

1
votes

I believe that while extracting data from the file you can pass in some additional parameters to ignore the header row

https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/extractor-parameters-u-sql#skipFirstNRows

@searchlog = 
EXTRACT ID1          int, 
        ID2           int, 
        Date          DateTime, 
        Rs          string, 
        Rs1        string, 
        Number            string, 
        Direction     string,
        ID3            int
FROM @"adl://rdl1.azuredatalakestore.net/blob1/vehicle1_09142014_JR.csv"
USING Extractors.Csv(skipFirstNRows:1);

Modifying the input file may or may not be possible in all scenarios specially if the input file is being dropped by stakeholders that you cannot control.

0
votes

I followed your steps and reproduce your issue.

enter image description here

My sample data:

ID1,ID2,Date,Rs,Rs1,Number,Direction,ID3
1,1,9/14/2014 0:00,46.81006,-92.08174,51,S,1
1,2,9/14/2014 0:00,46.81006,-92.08174,13,NE,1
1,3,9/14/2014 0:00,46.81006,-92.08174,48,NE,1
1,4,9/14/2014 0:00,46.81006,-92.08174,30,W,1

Based on the error log, I found it can't parse the title row.So, I removed the title row and everything works fine.

Modified data:

1,1,9/14/2014 0:00,46.81006,-92.08174,51,S,1
1,2,9/14/2014 0:00,46.81006,-92.08174,13,NE,1
1,3,9/14/2014 0:00,46.81006,-92.08174,48,NE,1
1,4,9/14/2014 0:00,46.81006,-92.08174,30,W,1

Usql script :

//Define schema of file, must map all columns
@searchlog = 
    EXTRACT ID1          int, 
            ID2           int, 
            Date          DateTime, 
            Rs          string, 
            Rs1        string, 
            Number            string, 
            Direction     string,
            ID3            int
    FROM @"/test/data.csv"
    USING Extractors.Csv();

OUTPUT @searchlog 
    TO @"/testOutput/dataOutput.csv"
    USING Outputters.Csv();

Output:

enter image description here

Hope it helps you.