0
votes

I have installed single node cluster hadoop and hive. I am able to load the data and display it in hive. I want to execute a script which creates temporary functions. I need to add jar file. The jar files are add esri-geometry-api.jar spatial-sdk-hive-1.0-MODIFIED.jar and HiveUDFs.jar

I refered: How to write a script file in Hive? I got this error: esri-geometry-api.jar does not exist

My configuration details:

$ echo $HADOOP_HOME:/home/hduser/hadoop-1.2.1
$ echo $JAVA_HOME:/usr/lib/java/jdk1.7.0_55
$ echo $:HIVE_HOME:/home/hduser/hadoop-1.2.1/hive-0.9.0-bin

java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

hadoop version:

Hadoop 1.2.1
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152
Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013
From source with checksum 6923c86528809c4e7e6f493b6b413a9a
This command was run using /home/hduser/hadoop-1.2.1/hadoop-core-1.2.1.jar


HIVE VERSION: hive-0.9.0
hduser@ubuntu:~$ echo $HIVE_HOME
/home/hduser/hadoop-1.2.1/hive-0.9.0-bin

I have hive script which i need to execute as below: I have a data that has latitute longitude at the time interval of 5 seconds.

add jar esri-geometry-api.jar spatial-sdk-hive-1.0-MODIFIED.jar HiveUDFs.jar;
create temporary function ST_AsText as 'com.esri.hadoop.hive.ST_AsText';
create temporary function ST_Intersects as 'com.esri.hadoop.hive.ST_Intersects';
create temporary function ST_Length as 'com.esri.hadoop.hive.ST_Length';
create temporary function ST_LineString as 'com.esri.hadoop.hive.ST_LineString';
create temporary function ST_Point as 'com.esri.hadoop.hive.ST_Point';
create temporary function ST_Polygon as 'com.esri.hadoop.hive.ST_Polygon';
create temporary function ST_SetSRID as 'com.esri.hadoop.hive.ST_SetSRID';
create temporary function collect_array as 'com.zombo.GenericUDAFCollectArray';
SELECT
    id,
    unix_timestamp(dt) - unix_timestamp(fv)
FROM (
    SELECT
        id, dt, fv
    FROM (
        SELECT
            id, dt,
            FIRST_VALUE(dt) OVER (PARTITION BY id ORDER BY dt) as fv,
            ROW_NUMBER() OVER (PARTITION BY id ORDER BY dt DESC) as lastrk
        FROM
            uber
        ) sub1
    WHERE
        lastrk = 1
    ) sub2
WHERE
    (unix_timestamp(dt) - unix_timestamp(fv)) < 28800;

My questions are as below:

  1. Do i need to start hadoop services before running HIVE as I observed that I can run HIVE directly without starting HADOOP services. If yes then what is the significance of having hadoop and how can I use it with hive?
  2. When I try to add JAR manually it gives me below error: hive> ADD JAR esri-geometry-api.jar /home/hduser/hadoop_jar; esri-geometry-api.jar does not exist

    hive> add jar esri-geometry-api.jar; esri-geometry-api.jar does not exist

I also added hive-site.xml as below:

<configuration>
<property>
<name>hive.aux.jars.path</name>
<value>file:///home/hduser/hadoop_jar/HIVEUDFs.jar,
file:///home/hduser/hadoop_jar/esri-geometry-api-1.0.jar,
file:///home/hduser/hadoop_jar/spatial-sdk-json-1.0.1-sources.jar</value>
</property>
</configuration>

I added the jar file to the lib folder of my hive directory in hadoop folder.

  1. When I try to run script:

    hduser@ubuntu:~/queries$ hive queries.hive

    WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Logging initialized using configuration in jar:file:/home/hduser/hadoop-1.2.1/hive-0.9.0-bin/lib/hive-common-0.9.0.jar!/hive-log4j.properties Hive history file=/tmp/hduser/hive_job_log_hduser_201404290234_597714109.txt

    hive>

  2. When i issue list jar; command it gives: file:/home/hduser/hadoop-1.2.1/hive-0.9.0-bin/lib/hive-builtins-0.9.0.jar

  3. I need to execute the script. Please help.

1

1 Answers

0
votes

The reason why you are not able to execute the script is -f option is missing execute the script as follows :

hduser@ubuntu:~/queries$ hive -f queries.hive
  • Since hive internally uses Hadoop for keeping its data and Mapreduce for execution. Hadoop services should be started while executing hive commands.

  • In the add jar statement Jar's completed path should be specified and each jar should be specified separately as follows

add jar <PATH_TO_JAR>/esri-geometry-api.jar;
add jar <PATH_TO_JAR>/spatial-sdk-hive-1.0-MODIFIED.jar;
add jar <PATH_TO_JAR>/HiveUDFs.jar;