1
votes

I currently have data where each row has a text passage and a numpy float array.

As far as I know, the it's not efficient to save these two datatypes into one data format (correct me if I am wrong). So I am going to save them separately, with another column of ints that will be used to map the two datasets together when I want to join them again.

I have having trouble figuring out how to append a column of ints next to the float arrays (if anyone has a solution to that I would love to hear it) and then save the numpy array.

But then I realized I can just save the float arrays as is with numpy.save without the extra int column if I can get a confirmation that numpy.save and numpy.load will never change the order of the arrays.

That way I can just append the loaded numpy float arrays to the pandas dataframe as is.

Logically, I don't see any reason why the order of the rows would change, but perhaps there's some optimization compression that I am unaware of.

Would numpy.save or numpy.load ever change the order of a numpy array of float arrays?

1
No, there is no reason that the order would be changed - roganjosh

1 Answers

2
votes

The order will not change by the numpy save / load. You are saving the numpy object as is. An array is an ordered object.

Note: if you want to save multiple data arrays to the same file, you can use np.savez.

>>> np.savez('out.npz', f=array_of_floats, s=array_of_strings)

You can retrieve back each with the following:

>>> data = np.load('out.npz')
>>> array_of_floats = data['f']
>>> array_of_strings = data['s']