1
votes

In one of my application I have been using presto and hive-metastore to query data from s3. In order to configure the hive-metastore on production(I am going to deploy presto and hive on docker separate containers) just wanted to know, does presto create multiple connections with hive-metastore for concurrent queries or does it create a single connection with hive metastore and use same for all concurrent queries?

for example, lets say In my application at instance 100 queries runs, so will presto create 100 connection with hive-metastore or just create a single connection and use same for all queries?

I am new to all those things so might be missing something obvious. Thanks in advance.

1

1 Answers

3
votes

Presto will create a separate connection to the metastore per metastore operation (Thrift RPC call). There is no connection pooling or reuse, so the connections are short lived. The metastore is accessed during planning to retrieve table metadata and while the query runs to retrieve partition metadata. For table metadata, a single query should not make concurrent requests. For partition metadata, concurrency is controlled via the hive.split-loader-concurrency config property, although this also includes listing files on the file system. Metastore calls are usually fast, so the number of concurrent connections should typically be much lower than the number of concurrent Presto queries.