10
votes

While looking at a micro-optimization question that I asked yesterday (here), I found something strange: an or statement in Java is running slightly faster than looking up a boolean value in an array of booleans.

In my tests, running the below algorithms on long values from 0 to 1 billion, alg1 is about 2% faster. (I have altered the order in which the algorithms are tested, and I get the same results). My question is: Why is alg1 faster? I would have expected alg2 to be slightly faster since it uses a lookup table, whereas alg1 has to execute 4 comparisons and 3 or operations for 75% of inputs.

private final static boolean alg1(long n)
{
  int h = (int)(n & 0xF);
  if(h == 0 || h == 1 || h == 4 || h == 9)
  {
    long tst = (long)Math.sqrt(n);
    return tst*tst == n;
  }  
  return false;

}

private final static boolean[] lookup = new boolean[16];
static
{
  lookup[0] = lookup[1] = lookup[4] = lookup[9] = true;
}
private final static boolean alg2(long n)
{
  if(lookup[(int)(n & 0xF)])
  {
    long tst = (long)Math.sqrt(n);
    return tst*tst == n;
  }
  else
    return false;
}

If you're curious, this code is testing if a number is a perfect square, and utilizes the fact that perfect squares must end in 0, 1, 4, or 9 in hex.

5

5 Answers

8
votes

Loading some random piece of data is generally slower than a little non-branching code.

It all depends upon processor architecture, of course. Your first if statement could be implemented as four instructions. The second may potentially need null pointer checking, bounds checking as well as the load and compare. Also more code means more compile time, and more chance for the optimisation to be impeeded in some manner.

3
votes

I would guess that the issues is that range checking for the array and if the array lookup is implemented as a method call. That would certainly overshadow 4 straight int compares. Have you looked at the byte code?

1
votes

According to this article accessing array elements are "2 or 3 times as expensive as accessing non-array elements". Your test shows that the difference may be even bigger.

1
votes

In the current example, I agree that bounds checking is probably what's getting you (why the JVM doesn't optimize this out is beyond me - the sample code could can deterministically be shown to not overflow...

Another possibility (especially with bigger lookup tables) is cache latency... It depends on the size of the processors' registers and how the JVM chooses to use them - but if the byte array isn't kept totally on processor, then you'll see a performance hit compared to a simple OR as the array is pulled onto the CPU for each check.

0
votes

It's an interesting piece of code, but 2% is a really small difference. I don't think you can conclude very much from that.