1
votes

I am using this Python UDF script:

import sys
import collections 
import datetime
import re

try:
    for line in sys.stdin: 
        line=line.strip()
        number,sd=line.split('\t')
        sd=sd.lower()
        sd=sd.split(' ')
        new_sd_list=collections.OrderedDict(collections.Counter(sd))
        new_sd=' '.join(new_sd_list)
        print('\t'.join([str(number),str(new_sd])))
except:
    print(sys.exc_info())

While executing in Putty The below command.

SELECT TRANSFORM(number,shortdescription) USING 'python name.py' \
   AS (number,shortdescription) FROM table;

I am getting this error:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"number": "ABC00548","shortdescription":""Master data inconsistency check in India Optimizer."}

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 4 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec

1
Add the table DDL and a data sampleDavid דודו Markovitz
DDL: number:string shortdescription:string sample data: "number":"ABC0003888" "shortdescription":"Clean up location name database used for Agricast weather page in DRUPAL and Historical Weather page in DRUPAL"Venkataraman
ABC00548does not seem to be a numberDavid דודו Markovitz
Its a string.. all the fileds in the table are string I think the error may be because of unicode in short description as there are cases where short description are like 'EXTRAPRÁTICA - SEAC' can this occure , if so how to correct itVenkataraman
Once again, please add the table DDL (show create table ..) and a data sample (~10 rows)David דודו Markovitz

1 Answers

0
votes
import sys
import collections 
import datetime
import re

try:
    for line in sys.stdin: 
        line=line.strip()
        number,sd=line.split('\t')
        sd=sd.lower()
        sd=sd.split(' ')
        new_sd_list=collections.OrderedDict(collections.Counter(sd))
        new_sd=' '.join(new_sd_list)
        print('\t'.join([str(number),str(new_sd)])) #syntax error
except:
    print(sys.exc_info())