Wednesday, September 24, 2008

The fastest loop in C

Sure, practically nobody knows assembler nowadays. But if you use a certain construct very often in time critical code, you might get curious what's the fastest way to do it. One example are loops. I tested this with a loop of the following properties:
  • Loop count is unknown at compile time, so the loop cannot be unrolled
  • The loop index i is not used within the loop body. In particular it doesn't matter it it's incremented or decremented
This kind of loop occurs very often in gavl colorspace conversion routines. The following C-file has 3 functions, which do exactly the same:

void do_something();

static int num;

void loop_1()
{
int i;
for(i = 0; i < num; i++)
do_something();
}

void loop_2()
{
int i = num;
while(i--)
do_something();
}

void loop_3()
{
int i = num+1;
while(--i)
do_something();
}

This code can be compiled with gcc -O2 -S. The resulting loop bodies are the following:

loop 1:

.L17:
xorl %eax, %eax
addl $1, %ebx
call do_something
cmpl %ebp, %ebx
jne .L17
loop 2:
.L11:
xorl %eax, %eax
addl $1, %ebx
call do_something
cmpl %ebp, %ebx
jne .L11
loop 3:
.L5:
xorl %eax, %eax
call do_something
subl $1, %ebx
jne .L5
As you see, the winner is loop 3. Here, the whole logic (without initialization) needs 2 machine instructions:
  • Decrement ebx
  • Jump is result is nonzero
I doubt it can be done faster. You also see, that loop 2 looks simpler than loop 1 in C, but the compiler produces exactly the same code. I changed the innermost gavl pixelformat conversion loops to loop 3 and could even measure a tiny speedup for very simple conversions.

No comments: