2
votes

I use cv2.imread and cv2.imdecode depending on if I am loading an image from disk or from url. Comparatively, I use image.load to load from disk, which utilizes libpng. When using cv2, my image.shape outputs with (height, width, channels). However when using torch, the shape is (channels, height, width).

I am curious as to why this is and how I can get the two to equate. My goal is to combine many images, downloaded with cv2, into a torch tensor utilizing the (channels, height, width) dimensions. I have tried to reshape the numpy arrays when downloaded with cv2 but the tensors do not match those downloaded with torch.

2

2 Answers

3
votes

Different libraries may store the image data in different memory formats - this is completely up to the library and its purpose (speed of traversing the image data, memory efficieny, etc...).

A possible solution (without further 3rd-party tools) for your problem can be the use of transpose. A simple example:

import numpy as np

x = np.random.random((3, 15, 17))
print(x.shape)

# transpose axes with this order
y = x.transpose((1,2,0))
print(y.shape)

# for the sake of testing the euqality of the respective slides:
print(np.linalg.norm(x[0,:,:] - y[:,:,0]))

Sample Output:

(3, 15, 17)
(15, 17, 3)
0.0
0
votes

Check out lutorpy:

Lutorpy is a libray built for deep learning with torch in python, by a two-way bridge between Python/Numpy and Lua/Torch, you can use any Torch modules(nn, rnn etc.) in python, and easily convert variables(array and tensor) between torch and numpy.

It has built-in support for conversion from numpy arrays to Torch tensor objects, see the "example usage" on their github:

## convert the numpy array into torch tensor
xt = torch.fromNumpyArray(xn)