We have been using Cowboy in production on our Compute Engine machines on GCP and we started benchmarking and improving the performance of our service to handle more Reqs/sec (in our case since we are in Adtech it is bids/sec).
After isolating and handling a lot of the issues separately we came down to Cowboy optimization, these are our current findings and limitations:
Cowboy setup
We are using Cowboy 2.5 with 200 acceptors and max backlog of 1024
init(Req, _State) ->
T1 = erlang:monotonic_time(),
{ok, BRjson, _} = cowboy_req:read_body(Req),
%% ---- rest of work goes here but is switched off for our test---
erlang:send_after(60, self(), {'RSP', x, no_workers}),
{cowboy_loop, Req, #state{t1 = T1}, hibernate}.
Erlang VM
OTP 21
VM args: -smp auto +P 134217727 +K true +A 64 -rate 1200 +stbt db +scl false +sfwi 500 +spp true +zdbbl 8092
Load
Json requests ~4KB in size. And testing is done using a separate machine on the same internal network (no SSL) using jmeter. All requests are POST with keep-alive
Servers
GCP Compute Engine 10 vcpu cores and 14GB RAM (now and tested before with the 4 vcpu)
Findings
We are able to reach to ~1900 reqs/sec but all CPU cores in htop are showing almost 80% utilization
At 1000 reqs/sec we se cpu utilization at 45-50% per core (still high bearing in mind that no other part of our application is running)
*Note: using the 4 vcpu machine we were able to get close to 700 reqs/sec and memory in all of our tests is barely utilizied or changing with load
QUESTION: How to improve cowboy's performance in terms of cpu usage?