python - How to change the dtype of certain columns of a numpy recarray?

votes

Suppose I have a recarray such as the following:

import numpy as np

# example data from @unutbu's answer
recs = [('Bill', '31', 260.0), ('Fred', 15, '145.0')]
r = np.rec.fromrecords(recs, formats = 'S30,i2,f4', names = 'name, age, weight')

print(r)
# [('Bill', 31, 260.0) ('Fred', 15, 145.0)]

Say I want to convert certain columns to floats. How do I do this? Should I change to an ndarray and them back to a recarray?

pythonpandasnumpy

3 Answers

votes

Here is an example using astype to perform the conversion:

import numpy as np
recs = [('Bill', '31', 260.0), ('Fred', 15, '145.0')]
r = np.rec.fromrecords(recs, formats = 'S30,i2,f4', names = 'name, age, weight')
print(r)
# [('Bill', 31, 260.0) ('Fred', 15, 145.0)]

The age is of dtype <i2:

print(r.dtype)
# [('name', '|S30'), ('age', '<i2'), ('weight', '<f4')]

We can change that to <f4 using astype:

r = r.astype([('name', '|S30'), ('age', '<f4'), ('weight', '<f4')])
print(r)
# [('Bill', 31.0, 260.0) ('Fred', 15.0, 145.0)]

votes

There are basically two steps. My stumbling block was in finding how to modify an existing dtype. This is how I did it:

# change dtype by making a whole new array
dt = data.dtype
dt = dt.descr # this is now a modifiable list, can't modify numpy.dtype
# change the type of the first col:
dt[0] = (dt[0][0], 'float64')
dt = numpy.dtype(dt)
# data = numpy.array(data, dtype=dt) # option 1
data = data.astype(dt)

votes

Here is a minor refinement of the existing answers, plus an extension to situations where you want to make a change based on the dtype rather than column name (e.g. change all floats to integers).

First, you can improve the conciseness and readability by using a listcomp:

col       = 'age'
new_dtype = 'float64'

r.astype( [ (col, new_dtype) if d[0] == col else d for d in r.dtype.descr ] )

# rec.array([(b'Bill', 31.0, 260.0), (b'Fred', 15.0, 145.0)], 
#           dtype=[('name', 'S30'), ('age', '<f8'), ('weight', '<f4')])

Second, you can extend this syntax to handle cases where you want to change all floats to integers (or vice versa). For example, if you wanted to change any 32 or 64 bit float into a 64 bit integer, you could do something like:

old_dtype = ['<f4', '<f8']
new_dtype = 'int64'

r.astype( [ (d[0], new_dtype) if d[1] in old_dtype else d for d in r.dtype.descr ] )

# rec.array([(b'Bill', 31, 260), (b'Fred', 15, 145)], 
#           dtype=[('name', 'S30'), ('age', '<i2'), ('weight', '<i8')])

Note that astype has an optional casting argument that defaults to unsafe so you may want to specify casting='safe' to avoid accidentally losing precision when casting floats to integers:

r.astype( [ (d[0], new_dtype) if d[1] in old_dtype else d for d in r.dtype.descr ],
          casting='safe' )

Refer to the numpy documentation on astype for more on casting and other options.

Also note that for general cases of changing floats to integers or vice versa you might prefer to check the general number type with np.issubdtype rather than checking against multiple specific dtypes.