While benchmarking some code to find out, if using a set is really faster than array when checking for elements included via include? I found some performance anomaly regarding strings and symbols inside the collection.
First the script I used for benchmarking. It basically creates an array containing 50 random 50 character strings, gets a sample of 20 and checks if all the sample values are included. The same data is used to create an set of strings, an array of symbols and a set of symbols.
require 'benchmark/ips'
require 'Set'
collection_size = 50
element_length = 50
sample_size = 20
Benchmark.ips do |x|
array_of_strings = begin
(1..collection_size).map {|pos| (0..element_length).map { ('a'..'z').to_a[rand(26)] }.join }
end
array_of_symbols = array_of_strings.map(&:to_sym)
set_of_strings = Set.new(array_of_strings)
set_of_symbols = Set.new(array_of_symbols)
sample_of_strings = array_of_strings.sample(sample_size)
sample_of_symbols = array_of_symbols.sample(sample_size)
x.report("array_of_strings: #{collection_size} elements with length #{element_length}, sample size #{sample_of_strings.length}") {
sample_of_strings.each do |s|
array_of_strings.include? s
end
}
x.report("set_of_strings: #{collection_size} elements with length #{element_length}, sample size #{sample_of_strings.length}") {
sample_of_strings.each do |s|
set_of_strings.include? s
end
}
x.report("array_of_symbols: #{collection_size} elements with length #{element_length}, sample size #{sample_of_symbols.length}") {
sample_of_symbols.each do |s|
array_of_symbols.include? s
end
}
x.report("set_of_symbols: #{collection_size} elements with length #{element_length}, sample size #{sample_of_symbols.length}") {
sample_of_symbols.each do |s|
set_of_symbols.include? s
end
}
x.compare!
end
Test system was a 2011 Macbook Pro, running OSX 10.10.4, both ruby version were installed using rvm 1.26.11.
When executing this with ruby 2.2.2, I got the following results:
set_of_strings: 145878.6 i/s
set_of_symbols: 100100.1 i/s - 1.46x slower
array_of_symbols: 81680.0 i/s - 1.79x slower
array_of_strings: 43545.9 i/s - 3.35x slower
As expected, the set is faster than array, even so not as fast as I expected. What struck me strange was, that the set containing strings was faster than the one containing symbols, while for arrays the symbol one was much faster. I repeated the script multiple times and the ratio stayed the same.
To get more insights, I ran the benchmark script with ruby 2.1.6 and got those results:
set_of_symbols: 202362.3 i/s
set_of_strings: 145844.1 i/s - 1.39x slower
array_of_symbols: 39158.1 i/s - 5.17x slower
array_of_strings: 24687.8 i/s - 8.20x slower
Again sets are faster than arrays, but the array performance is much worse than in ruby 2.2.2, seems performance improvement went very well compared to 2.1.6 for that cases.
The odd thing is the result of the sets. The set containing strings reaches about the same instructions per second in ruby 2.2.2 and 2.1.6., which is how it should be. But the set containing symbols is twice as fast in ruby 2.1.6. than it is in 2.2.2!
I also varied the parameters for the benchmark script and the results stay the same. Sets with strings reach about the same i/s in 2.1.6 and 2.2.2, while sets of symbols are much slower in 2.2.2.
My questions are now
- can anyone explain this behavior?
- is there anything I overlooked in my benchmarking script?
- if others can reproduce this, how should this be reported to ruby developers?
Update 1:
Seems the underlying problem is Hash class itself, that is used for storing values in Set class. Just created a hash with numbers of 1 to 1000 as string/symbol keys and did Hash[k] access with a sample.
Ruby 2.2.2:
h_string: 1000 keys, sample size 200: 29374.4 i/s
h_symbol: 1000 keys, sample size 200: 10604.7 i/s - 2.77x slower
Ruby 2.1.6.:
h_symbol: 1000 keys, sample size 200: 31561.9 i/s
h_string: 1000 keys, sample size 200: 25589.7 i/s - 1.23x slower
For some reason, using symbols as Hash keys is much slower in 2.2.2, benchmarking script used:
require 'benchmark/ips'
collection_size = 1000
sample_size = 200
Benchmark.ips do |x|
h_string = Hash.new
h_symbol = Hash.new
(1..collection_size).each {|k| h_string[k.to_s] = 1}
(1..collection_size).each {|k| h_symbol[k.to_s.to_sym] = 1}
sample_of_string_keys = h_string.keys.sample(sample_size)
sample_of_symbol_keys = sample_of_string_keys.map(&:to_sym)
x.report("h_string: #{collection_size} keys, sample size #{sample_of_string_keys.length}") {
sample_of_string_keys.each do |s|
h_string[s]
end
}
x.report("h_symbol: #{collection_size} keys, sample size #{sample_of_symbol_keys.length}") {
sample_of_symbol_keys.each do |s|
h_symbol[s]
end
}
x.compare!
end
Update 2:
I repeated the tests with latest ruby 2.3.0dev (2015-07-26 trunk 51391) [x86_64-darwin14]
and there the set with symbols is much faster again and for small collection_size and sample_size on one level with ruby 2.1.6
For bigger numbers like a Hash with 10000 values and checking 100 samples, ruby 2.1.6 reaches about double the iterations of ruby-head (and nearly 3 times those of ruby 2.2.2). So it seems some improvements were going on in ruby-head, but still not enough to get back to old performance.
Update 3:
Thanks to the comment by @cremno I rechecked my initial set benchmark with the 2.2 branch and there 2.2 is on the same level as 2.1.6
My conclusions for now
- always benchmark your actual code and also think about if the difference is really visible. In my use case, even so that kind of check is done about 50 times per web request, the overall time difference is only a fraction of a millisecond
- ruby 2.3 and backported code in hopefully upcoming 2.2.3 seem to get faster again with symbols in Hashes
- don't make assumptions like 'symbols are always faster' or 'Set.include? is always faster than Array.include?'
- Symbol Garbage Collection comes with a big performance penalty, if you handle really big Hashes.
ruby_2_2
(the 2.2 branch)? See Bug #11035 (2.3 and backported to 2.2 however not part of a release yet). There's also Misc #10278 which might be another cause of the improvement in 2.3. – cremno-DUSE_SYMBOL_GC=0
toCFLAGS
or directly editinternal.h
(change the 1 into a 0). That might help with the performance. – cremno