How to read from Hive using Apache Beam / how to use Hive as a source in Apache Beam ?
1
votes
2 Answers
0
votes
HadoopInputFormatIO can be used to read from Hive as below :
Configuration conf = new Configuration();
conf.setClass("mapreduce.job.inputformat.class", HCatInputFormat.class,
InputFormat.class);
conf.setClass("key.class", LongWritable.class, WritableComparable.class);
conf.setClass("value.class", DefaultHCatRecord.class, Writable.class);
conf.set("hive.metastore.uris", "...");
HCatInputFormat.setInput(hiveConf, "myDatabase", "myTable", "myFilter");
PCollection<KV<LongWritable, DefaultHCatRecord>> data =
p.apply(HadoopInputFormatIO.<Long,
DefaultHCatRecord>read().withConfiguration(conf));
0
votes
A pull request merged in July 2017 allows Beam 2.1.0 to support hive via the HCatalog https://issues.apache.org/jira/browse/BEAM-2357 .