Consider some code that needs to executed repeatedly anywhere between 1-1,000,000 times, and that the number of repeats is not known at compile time. It is my understanding that loop unrolling would be a negligible optimization given the large number of loops and would only be optimizing up to the max_unrolls specified at compile time. The idea that I have come up with is to implement a binary (or base 2) partial loop un-roller that essentially executes some_function repeatedly a number of times that is specified at run-time. I have come up with some code to demonstrate the idea, an condensed version is show below. The method used in the code below has a number of issues from a usability perspective.
- It requires the coder to manually copy out the base 2 unroll essentially copy out the unroll 2^n-1 times.
- This also needs to be re-done for every new function that needs to use this method.
My question is threefold. Firstly am I missing something, is the compiler already intelligent enough to do this itself? Secondly what is an efficient way of implementing this so that it becomes feasible to benchmark this against a standard for
loop, whilst solving the above issues. Thirdly to your knowledge is there a library that has already implemented this.
Please note: I am doing this purely for the fun of it, but do not have the expertise to know if this will be effective. I have done testing on the code but have only found very small improvements, however I believe I have not been able to unroll manually far enough for a fair comparison to be made. Also I am aware that this method has the potential to create massive binary sizes, however I believe this would be a worthwhile time memory trade off. Also, if you post any assembly it will probably take me another year or so to understand it.
inline void some_reapeated_function(int &function_parameter_1, int &function_parameter_2)
{
function_parameter_1 /= function_parameter_2;
}
// Function to be called when you need it to be unrolled.
int runtime_unroll(unsigned int &no_of_loops, int &function_parameter_1, int &function_parameter_2)
{
std::vector<bool> binary_vector;
// Stores the number of loops in a binary representation in a vector.
binary_function.reserve(no_of_loops);
while(no_of_loops)
{
if (no_of_loops&1)
binary_vector.push_back(false);
else
binary_vector.push_back(true);
no_of_loops>>=1;
}
// If binary of no_of_loops contains a 2^0 execute once.
if (binary_vector[0])
{
some_reapeated_function(function_parameter_1,function_parameter_2);
}
// If binary of no_of_loops contains a 2^1 execute twice.
if (binary_vector[1])
{
some_reapeated_function(function_parameter_1,function_parameter_2);
some_reapeated_function(function_parameter_1,function_parameter_2);
}
//If binary of no_of_loops contains a 2^2 execute 4 times.
if (binary_vector[2])
{
some_reapeated_function(function_parameter_1,function_parameter_2);
some_reapeated_function(function_parameter_1,function_parameter_2);
some_reapeated_function(function_parameter_1,function_parameter_2);
some_reapeated_function(function_parameter_1,function_parameter_2);
}
/* This example only covers from 1 to 2^3-1 or up to 7 unrolls.
This can continue depending on the number of repetitions needed and
could incorporate a for loop to continue after the loop has fully unrolled */
}
int
in binary form where each of the 1's represent a function being executed to 2^n times where n is the index of the each bit within the integer? Duffs device is interesting though, thankyou :) – silvergasp