1
votes

We not configured in our presto workers the parameter - query.max-memory-per-node

From the log seems that the value for query.max-memory-per-node is set automatically

grep -r "query.max-memory-per-node"  /presto/data/var/log/server.log
2019-08-08T14:25:03.840Z    INFO    main    Bootstrap       query.max-memory-per-node                              4402341478.40B

My question

Do we need to set the query.max-memory-per-node in config.properties ?

Or value for the query.max-memory-per-node , will set by the presto

But as we can see from the logs , presto set only - 4402341478.40B ( few GIGA bytes ) , so this is small size

And when query is need more memory then query could crash

Please your opinion , do we need to set the parameter query.max-memory-per-node – in config.properties

In order to set higher values as 20-30 GIGA

Reference - https://prestodb.github.io/presto-admin/docs/current/installation/presto-configuration.html

1

1 Answers

3
votes

The default for query.max-memory-per-node is 10% of the available heap memory. The default for this property is set here:

https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/memory/NodeMemoryConfig.java#L35

IIRC we set the default value so you could have a few large queries running in a cluster. The exact number isn't clear from this value alone. If you take a look at the configurations in the file linked above you will see that the system first reserves 30% of the heap for "unaccounted memory allocations", because Presto doesn't track all allocations. Then the query.max-memory-per-node is only a limit on "user" memory, which is memory controllable by the query author such as group by and join hash tables. This value does not include input and output buffers which are automatically managed by Presto.

With all of that put together, I would expect with the default values you can run 3-5 large queries concurrently on the code.

As for how you want to set these, it really depends on your workload. If you expect to have a large mixed workload, the defaults may work for you. If you want to be able to dedicate the entire cluster to a single worker, you can increase the values to near the heap size (make sure to leave head room for untracked allocations).