Differences in image dimensions from cv2 (python) and torch/image (libpng)

Question

I use cv2.imread and cv2.imdecode depending on if I am loading an image from disk or from url. Comparatively, I use image.load to load from disk, which utilizes libpng. When using cv2, my image.shape outputs with (height, width, channels). However when using torch, the shape is (channels, height, width).

I am curious as to why this is and how I can get the two to equate. My goal is to combine many images, downloaded with cv2, into a torch tensor utilizing the (channels, height, width) dimensions. I have tried to reshape the numpy arrays when downloaded with cv2 but the tensors do not match those downloaded with torch.

DomTomCat DomTomCat · Accepted Answer · 2016-06-02T08:25:55

Different libraries may store the image data in different memory formats - this is completely up to the library and its purpose (speed of traversing the image data, memory efficieny, etc...).

A possible solution (without further 3rd-party tools) for your problem can be the use of transpose. A simple example:

import numpy as np

x = np.random.random((3, 15, 17))
print(x.shape)

# transpose axes with this order
y = x.transpose((1,2,0))
print(y.shape)

# for the sake of testing the euqality of the respective slides:
print(np.linalg.norm(x[0,:,:] - y[:,:,0]))

Sample Output:

(3, 15, 17)
(15, 17, 3)
0.0

Differences in image dimensions from cv2 (python) and torch/image (libpng)

2 Answers