2
votes

I'am a beginner in Spark, I tried insert data into Hive table vie SparkSQL, but have an error java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.spark.sql.catalyst.expressions.GenericRow.isNullAt(rows.scala:79)

please find my code bellow:

public class HiveWriter {

public static class IPCCount implements Serializable {
    public IPCCount(int permid, int year, String ipc, int count) {
        this.permid = permid;
        this.year = year;
        this.ipc = ipc;
        this.count = count;
    }

    public int permid;
    public int year;
    public int count = 0;
    public String ipc;
}

public static void main(String[] args) {
    SparkConf sparkConf = new SparkConf().setAppName("HiveWriter");
    JavaSparkContext sc = new JavaSparkContext(sparkConf);
    HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc());

    JavaRDD<IPCCount> lines = sc.parallelize(Arrays.asList(new IPCCount(000000000, 2010, "000000000", 10)));
    DataFrame df = sqlContext.createDataFrame(lines, IPCCount.class);
    df.registerTempTable("ipc_codes_new");
    sqlContext.sql("INSERT INTO TABLE xademo.ipc_codes SELECT * FROM ipc_codes_new");

    sc.close();
}}

The reading from Hive tables works well, but I can't insert data. Also I've tested that data in temp table exists.

I use Spark 1.3.

Thanks in advance!

1

1 Answers

0
votes

if I recall correctly, registerTempTable doesn't make the table ipc_codes_new available for the hive, in another word that temp table is not visible to actual hive tables.

that temp table can be consumed by HiveContext (as temporary source) on Spark, but not the Hive itself. And the INSERT query you send goes directly to Hive itself.