4
votes

I have a created a numpy array, each element of the array contains an array of the same shape (9,5). What I want is a 3D array.

I've tried using np.stack.

data = list(map(lambda x: getKmers(x, 9), data)) # getKmers creates a       
                                                 # list of list from a pandas dataframe
data1D = np.array(data)                          # shape (350,)
data2D = np.stack(data1D)

data1D:
array([list([      pdbID  AtomNo Type      Eta    Theta
0  1a9l.pdb     2.0    G  169.225  212.838
1  1a9l.pdb     3.0    G  168.439  206.785
2  1a9l.pdb     4.0    U  170.892  205.845
3  1a9l.pdb     5.0    G  164.726  225.982
4  1a9l.pdb     6.0    A  308.788  144.370
5  1a9l.pdb     7.0    C  185.211  209.363
6  1a9l.pdb     8.0    U  167.612  216.614
7  1a9l.pdb     9.0    C  168.741  219.239
8  1a9l.pdb    10.0    C  163.639  207.044,       pdbID  AtomNo Type          Eta    Theta
1  1a9l.pdb     3.0    G  168.439  206.785
2  1a9l.pdb     4.0    U  170.892  205.845
3  1a9l.pdb     5.0    G  164.726  225.982
4  1a9l.pdb     6.0    A  308.788  144.370
5  1a9l.pdb     7.0    C  185.211  209.363
6  1a9l.pdb     8.0    U  167.612  216.614
7  1a9l.pdb     9.0    C  168.741  219.239
8  1a9l.pdb    10.0    C  163.639  207.044

I get this error: cannot copy sequence with size 9 to array axis with dimension 5

I want to create a 3D Matrix, where every subarray is in the new 3D dimension. I gues the new shape would be (9,5,350)

3
Please share a sample of the array and expected outputyatu
So you have an array of lists of pandas dataframes..??yatu
As yatu says, the data structure seems off. To avoid the XY problem, can you share more context about what you intend to do? Dataframes are designed with 2d data in mind (# of sample vs. property is the typical case), so it makes little sense to stack them into a 3d numpy array; you would usually want either to keep them separate as a list/dict or stack them as a dataframe with an additional row for indexing.Leporello
Where does the error occur? The data1D = np.array(data) step is probably not needed. np.stack works with a list of arrays (in fact if given an array it will just treat it like a list). np.stack(data) should work, if data really is a list of matching size arrays.hpaulj
I think, before doing what I want I have to convert the dfs to matrices with pd.get_values() and then try to create the matrix based on this. The final goal is to create a matrix as in put for a neural network (feed forward and rnn).Patrick

3 Answers

6
votes

You need to use

 data.reshape((data.shape[0], data.shape[1], 1))

Example

from numpy import array
data = [[11, 22],
    [33, 44],
    [55, 66]]
data = array(data)
print(data.shape)
data = data.reshape((data.shape[0], data.shape[1], 1))
print(data.shape)

Running the example first prints the size of each dimension in the 2D array, reshapes the array, then summarizes the shape of the new 3D array.

Result

(3,2)
(3,2,1)

Source :https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/

1
votes

If you want to create a 3D Matrix where every subarray is in the new 3D dimension, wouldn't the final shape be (350,9,5)? In that case, you can simply use:

new_array = np.asarray(data).reshape(350,9,5)
1
votes

It seems from your question that getKmers(x, 9) produces a list of 9 length-350 lists, and the data input has 5 elements. You want a (9, 5, 350) array out of this. This should be achievable with:

arr = np.swapaxes([getKermers(x, 9) for x in data], 0, 1)

Note that swapaxes is NOT the same as reshaping. If you were to just do np.array([getKermers(x, 9) for x in data]).reshape(9, 5, 350), you'd end up with the desired output shape but your data would be in the wrong order.