I want to load data from hbase and then proceed them using Spark ! I use Spark 2.0.2 on google cloud and hbase 1.2.5
On the internet, I have found some examples that use JavaHBaseContext but i don't know where to find this class because i don't have any jar file hbase called hbase-spark ?
And I have found this code too, that use HBaseConfiguration and ConnectionFactory to make connection with hbase database:
Configuration conf = HBaseConfiguration.create();
conf.addResource(new Path("/etc/hbase/conf/core-site.xml"));
conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
conf.set(TableInputFormat.INPUT_TABLE, tableName);
Connection connection = ConnectionFactory.createConnection(conf);
Admin admin = connection.getAdmin();
Table tab = connection.getTable(TableName.valueOf(tableName));
byte [] row = Bytes.toBytes("TestSpark");
byte [] family1 = Bytes.toBytes("MetaData");
byte [] height = Bytes.toBytes("height");
byte [] width = Bytes.toBytes("width");
Put put = new Put(row);
put.addColumn(family1, height, Bytes.toBytes("256"));
put.addColumn(family1, width, Bytes.toBytes("384"));
tab.put(put);
But I get an error about the Connection connection = ConnectionFactory.createConnection(conf);
that is :
error: unreported exception IOException; must be caught or declared to be thrown Connection connection = ConnectionFactory.createConnection(conf);
Can any of you tell me how to do load data from hbase to be proceed using Spark ?
PS : I program Java
hbase-spark.jar
is the (emerging) standard HBase plugin for Spark, that was contributed by Cloudera and is available (a) in the CDH distro, (b) as an additional JAR for other distros using HBase 1.x, or (c) natively in HBase 2.x -- see blog.cloudera.com/blog/2014/12/… and blog.cloudera.com/blog/2015/08/… – Samson Scharfrichtershc
promoted by HortonWorks, as a Spark package docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/… and repo.hortonworks.com/content/repositories/releases/com/… – Samson Scharfrichter