1
votes

We have Hadoop cluster based on ambari Since thrift server have poor performance , we decided to replace it with presto Our current Hadoop cluster have the following machines 960 data node machines ( based on redhat 7 OS )

Few words about the presto- Presto (or PrestoDB) is an open source, distributed SQL query engine, designed from the ground up for fast analytic queries against data of any size. It supports both non-relational sources, such as the Hadoop Distributed File System (HDFS),

We installed the new presto server as the following First we installed the OS ( redhat 7 ) , total 13 machines 1 machine for the presto coordinator And 12 machines for presto workers

After installing the OS We installed successfully the presto ( presto coordinator + presto workers )

Now we are stuck about how to do the integration between presto cluster to the Hadoop cluster

I will give short example about hive connector ( hive.properties )

we have the following variable hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

since this file are located the data node machines and of course not on the presto worker machines , I assume that we need to copy these files from one of the data node machine to the presto workers machines

am I right here ?

1

1 Answers

0
votes

You normally do not need to configure hive.config.resources to allow Presto to talk to your HDFS cluster. Try using Presto without that configuration. Only configure it if you have special requirements such as Hadoop KMS.

To configure it, copy the appropriate Hadoop config file(s) to your Presto machines (coordinator and workers), then set hive.config.resources to point to those file(s).

See the Hive connector documentation for more details.