18
votes

I have two arrays of strings:

In [51]: r['Z']
Out[51]: 
array(['0', '0', '0', ..., '0', '0', '0'], 
      dtype='|S1')

In [52]: r['Y']                                                                                                                
Out[52]: 
array(['X0', 'X0', 'X0', ..., 'X0', 'X1', 'X1'], 
      dtype='|S2')

What is the difference between S1 and S2? Is it just that they hold entries of different length?

What if my arrays have strings of different lengths?

Where can I find a list of all possible dtypes and what they mean?

2

2 Answers

28
votes

See the dtypes documentation.

The |S1 and |S2 strings are data type descriptors; the first means the array holds strings of length 1, the second of length 2. The | pipe symbol is the byteorder flag; in this case there is no byte order flag needed, so it's set to |, meaning not applicable.

4
votes

For storing strings of variable length in a numpy array you could store them as python objects. For example:

In [456]: x=np.array(('abagd','ds','asdfasdf'),dtype=np.object_)

In [457]: x[0]
Out[457]: 'abagd'

In [459]: map(len,x)
Out[459]: [5, 2, 8]

In [460]: x[1]=='ds'
Out[460]: True

In [461]: x
Out[461]: array([abagd, ds, asdfasdf], dtype=object)

In [462]: str(x)
Out[462]: '[abagd ds asdfasdf]'

In [463]: x.tolist()
Out[463]: ['abagd', 'ds', 'asdfasdf']

In [464]: map(type,x)
Out[464]: [str, str, str]