2
votes

I am trying to use microsoft's cognitive services with data lake and have run into a problem while trying to get key phrases and sentiment from the text in a column of a CSV file.

I have checked to make sure that the file is formatted correctly and is being read correctly (I have done a few basics, like copying, to make sure it is workable).

I have also made sure that the column I am interested in the CSV file (Description) contains just text(string) when it is extracted by itself.

The input file and output folder are in my Azure data lake and I am running the script from my data lake analytics on Azure. I have not tried to run this locally in Visual Studio.

I used Key Phrases Extraction (U-SQL) and Sentiment Analysis (U-SQL) as my reference and followed the directions there, including getting the plugins.

In each case when I submit the job I get an error that I cannot seem to find a way round. Below I have shown the code that I have used for each and the error that I get when running it.

Key Phrase Code

REFERENCE ASSEMBLY [TextSentiment];
REFERENCE ASSEMBLY [TextKeyPhrase];

@myinput =
EXTRACT 
    Modified_On string,
    _Name string,
    Description string,
    Customer string,
    Category string,
    Target_Market string,
    Person_Responsible string,
    Status string,
    _Region string,
    Modified_On_2 string,
    Created_On string,
    _Site string,
    _Team string    
FROM "/userData/fromSharepoint/Game_Plans"
USING Extractors.Csv(skipFirstNRows:1);

@keyphrase =
PROCESS @myinput
PRODUCE 
    Description,
    KeyPhrase string
READONLY
    Description
USING new Cognition.Text.KeyPhraseExtractor();

OUTPUT @keyphrase
    TO "/userData/testingCognitive/tesing1.csv"
    USING Outputters.Csv();

Key Phrase Error Message

enter image description here

Sentiment Code

REFERENCE ASSEMBLY [TextSentiment];
REFERENCE ASSEMBLY [TextKeyPhrase];

@myinput =
EXTRACT 
    Modified_On string,
    _Name string,
    Description string,
    Customer string,
    Category string,
    Target_Market string,
    Person_Responsible string,
    Status string,
    _Region string,
    Modified_On_2 string,
    Created_On string,
    _Site string,
    _Team string    
FROM "/userData/fromSharepoint/Game_Plans"
USING Extractors.Csv(skipFirstNRows:1);

@sentiment =
PROCESS @myinput
PRODUCE 
    Description,
    sentiment string,
    conf double
READONLY
    Description
USING new Cognition.Text.SentimentAnalyzer(true);

OUTPUT @sentiment
    TO "/userData/testingCognitive/tesing1.csv"
    USING Outputters.Csv();

Sentiment Error Message

enter image description here

Any assistance on how to solve this would be much appreciated.

Alternatively if anyone has got these functions working and can provide some scripts to test with and links to input files to download that would be awesome.

1

1 Answers

1
votes

I can't reproduce your exact error (can you post some simple sample data?) but I can get these libraries to work. I think the KeyPhraseExtractor by default expects columns called Text and KeyPhrase so if you are going to change them then you have to pass your column names in as arguments, eg

@keyphrase =
    PROCESS @myinput
    PRODUCE Description,
            KeyPhrase string
    READONLY Description
    USING new Cognition.Text.KeyPhraseExtractor("Description", "KeyPhrase");

UPDATE: There are some invalid characters in your sample file, just after the word "Bass". This is a non-breaking space (U+00A0) and I don't think you'll be able to import them - happy to be corrected. I removed these manually and was able to import the file. You could pre-process them in some manner.

Invalid characters