1
votes

I have executed the Hive SQL script with Custom Hive UDF function in Select query where condition, it has been running more than two days. I would like to know what exactly the problem here? invoking java takes much time or query execution it self taking much time?

My Data set is as follows, Table A has 2 million records, Table B has 1 million records,

The Sample Query is as follows

Select **** FROM (SELECT * FROM A A1 WHERE A1.ds in ('2014-06-11', '2014-06-12') ) A1 LEFT OUTER JOIN (SELECT * FROM B B1 WHERE B1.ds in ('2014-06-11', '2014-06-12') ) B1 Where customUDF(A1.data, B1.data)

What could be the issue here? is there any hive script profiing tool available to find where exactly time has been taken?

1

1 Answers

0
votes

Assuming that you have access to the UDF, you can add the following to the function (sudo code):

long start = System.currentTimeMillis();
MapredContext context = MapredContext.get();
Reporter reporter = context.getReporter();
String group = "instrumentation.udf";
String counter = "customUDF";

// function business logic

long elapsed = System.currentTimeMillis() - start;
reporter.incrCounter(group, counter, elapsed);