2
votes

We noticed about the following strange behavior in our presto cluster ( presto installed on Linux machines )

We have 9 presto workers machines ,

And from the presto dashboard we can see that some time there are 7-8 active workers and some time all presto workers - 9

is it normal behavior ?

From the presto workers logs I cant see something unusual

And I not sure if we need to searched any network problem or any other issue ?

enter image description here

Note - when I restart all presto workers , then after restart the presto workers are stable on the dashboard , but after 5-10 Hours we get again the strange behavior again , we are Helpless with this situation ,

Note1 - we check if presto binaries restart in accidentally - but this isnt the case , all presto workers binaries are stable

./launcher status
Running as 22815

I must to said additionally that Presto dashboard not show which of the presto workers was down , so this is very difficult to understand which are the "bad" presto workers ,

*** in the presto coordinator log- we can see message like this:

- but not sure this are related to our issues? 

WARN    http-client-memoryManager-scheduler     com.facebook.presto.memory.RemoteNodeMemory     Error fetching memory info from http://105.14.25.4:1010/v1/memory: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed
1
This isn't a normal behavior, something is not working as it supposed to be. You may want to ask for troubleshooting advice on #troubleshooting in Presto Community Slack (prestosql.io/community.html). - Piotr Findeisen
in that case , do you have direction ? or some hint ? , I think the community will answer after some time , and stack-overflow is place that we get fast answers -:) - jessica
If only I knew the answer... You need to check the logs of the coordinator and the workers and search for anything abnormal. Community can help understand the meaning and significance of logs (especially that creators of Presto are active there). - Piotr Findeisen
OK , I will bth - can you please advice me on this thread - stackoverflow.com/questions/57392597/… - jessica
Dear - @Piotr Findeisen , please see my update in the question , maybe this is realted to mu issue? - jessica

1 Answers

4
votes

i am so apologize for the inconvenience , about my question

actually this is my mistake and I will explain

in this presto cluster we have 9 presto workers

but I forget to delete the same host name workers from other cluster

so this behavior is because 3 duplicate host names ( presto workers )

after removing the duplicate presto workers , now presto is very stable