4
votes

we have an HDP 2.6.4 spark cluster with 10 linux worker machines.

The cluster runs spark applications over HDFS. The HDFS is installed on all the workers.

We wish to install presto that will query the HDFS of the cluster, however due to lack of CPU resources in the worker machines (only 32 cores per machine) the plan is to install presto outside of the cluster.

For that purpose we have several ESX, each ESX will have 2 VMs, and each VM will run a single presto server.

All the ESX machines will connected to the spark cluster via 10g network cards so that the two clusters will be in the same network.

My question is - can we install presto on the VM cluster and although the HDFS is not on the ESX cluster (but instead on the spark cluster)?

EDIT:

Fromt eh answer we got it seems that installing presto on VM is standard, so I'd like to clarify my question:

Presto has a configuration file named hive.properties under presto/etc.

Inside that file there’s a parameter named hive.config.resources with the following value:

/etc/hadoop/conf/presto-hdfs-site.xml,/etc/hadoop/conf/presto-core-site.xml

These files are HDFS config files, but since the VM cluster and the spark cluster (which contains the HDFS) are separate ones (the presto on the VM cluster should access the HDFS that resides on the spark cluster), the question is –

should these files be copied from the spark cluster to the VM cluster?

2
That should work. This is a pretty common setup.Dain Sundstrom
@Dain , see the update from EladJudy
FWIU most setups don’t need the extra site xml files, but is you do people normally just copy them.Dain Sundstrom

2 Answers

1
votes

Regarding to your question - My question is - can we install presto on the VM cluster and although the HDFS is not on the ESX cluster (but instead on the spark cluster)?

The answer is YES

On this cluster that isn't co hosted with HDFS , don't forget to set the fowling parameter in hive.properties

hive.force-local-scheduling=false
0
votes

As long as the Presto VMs are configured as edge nodes (aka gateway nodes) and have all the necessary config files and tools you shouldn't have any problem. For details on edge nodes see: