I'm trying to rotate and image counter-clockwise by 90 degrees, and then flip it horizontally.
My first approach, was to just use OpenCV:
cv::transpose(in, tmp); // transpose around top left
cv::flip(tmp, out, -1); // flip on both axes
For performance, I'm trying to merge the two functions into one.
My code:
void ccw90_hflip_640x480(const cv::Mat& img, cv::Mat& out)
{
assert(img.cols == 640 && img.rows == 480);
assert(out.cols == 480 && out.cols == 640);
uint32_t* imgData = (uint32_t*)img.data;
uint32_t* outData = (uint32_t*)out.data;
uint32_t *pRow = imgData;
uint32_t *pEnd = imgData + (640 * 480);
uint32_t *dstCol = outData + (480 * 640) - 1;
for( ; pRow != pEnd; pRow += 640, dstCol -= 1)
{
for(uint32_t *ptr = pRow, *end = pRow + 640, *dst = dstCol;
ptr != end;
++ptr, dst -= 480)
{
*dst = *ptr;
}
}
}
I thought the above would be faster, but it's not. I can't think of any reason it wouldn't be faster, beside OpenCV possibly using NEON.
I found this article/presentation: http://shervinemami.info/NEON_RotateBGRX.swf
The transposition and flipping are blurred together in a way that makes it very hard to modify to where it would rotate the other way, and flip around the horizontal axis like I need it too. The article is very old, so I'm hoping there is a more straightforward way of doing what I need.
So what's the easiest way to transpose a 4x4 matrix of uint32 using arm NEON?