I'm playing with ets tweaks and specifically with read_concurrency
. I've written simple test to measure how this tweak impacts on read performance. Test implementations is here and there.
Briefly, this test sequentially creates three [public, set]
ets tables with different read_concurrency
options (without any tweaks, with {read_concurrency, true}
and with {read_concurrency, false}
). After one table created, test runs N
readers (N
is power of 2 from 4 to 1024). Then readers performs random reads for 10 seconds and reports how many read operation they have performed.
Result is quite surprising for me. There absolutely no difference between 3 these tests. Here is the test result.
Non-tweaked table
4 workers: 26610428 read operations
8 workers: 26349134 read operations
16 workers: 26682405 read operations
32 workers: 26574700 read operations
64 workers: 26722352 read operations
128 workers: 26636100 read operations
256 workers: 26714087 read operations
512 workers: 27110860 read operations
1024 workers: 27545576 read operations
Read concurrency true
4 workers: 30257820 read operations
8 workers: 29991281 read operations
16 workers: 30280695 read operations
32 workers: 30066830 read operations
64 workers: 30149273 read operations
128 workers: 28409907 read operations
256 workers: 28381452 read operations
512 workers: 29253088 read operations
1024 workers: 30955192 read operations
Read concurrency false
4 workers: 30774412 read operations
8 workers: 29596126 read operations
16 workers: 24963845 read operations
32 workers: 29144684 read operations
64 workers: 29862287 read operations
128 workers: 25618461 read operations
256 workers: 27457268 read operations
512 workers: 28751960 read operations
1024 workers: 28790131 read operations
So I'm wondering how should I implement my test to see any difference and realize usecase for this optimization?
I have run this test on following installations:
- 2-core, 1 physical CPU, Erlang/OTP 17 [erts-6.1] [64-bit] [smp:2:2] [async-threads:10] [hipe] [kernel-poll:false] (example test output is from this run)
- 2-core, 1 physical CPU, Erlang/OTP 17 [erts-6.1] [64-bit] [smp:2:2] [async-threads:10] [hipe] [kernel-poll:true]
- 8-core 1 physical CPU, Erlang/OTP 17 [erts-6.4] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]
- 8-core 1 physical CPU, Erlang/OTP 17 [erts-6.4] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:true]
- 64-core 4 physical CPU, Erlang/OTP 17 [erts-6.3] [source] [64-bit] [smp:64:64] [async-threads:10] [hipe] [kernel-poll:false]
- 64-core 4 physical CPU, Erlang/OTP 17 [erts-6.3] [source] [64-bit] [smp:64:64] [async-threads:10] [hipe] [kernel-poll:true]
There is all the same (except absolute measurement values, of course). So could anybody tell me WHY? And what should I do to see any difference?
UPD According Fred's answer, I've updated my test to avoid workers of mailbox thrashing. Unfortunately, there was no significant change in results.
UPD One another implementation according to @Pascal advice. Now all workers properly seeding their random generators. Again same results.
random:uniform/1
: as you spawn individual processes without initializing the random seed (a value stored silently in the process dictionary) all your processes will execute the same sequence, but maybe it is on purpose :o) – Pascalerlang:now/1
, but no changes again. – Viacheslav Kovalevrandom:seed(erlang:now())
to each worker process, and there is no doubt that now all workers are running on their own random sequences, but when I told "no changes again" I meant there is no changes in performance sence. One idea which I want to try - make ets keys more expensive for maching operation. This is last my hope about this tweak. – Viacheslav Kovalev