27
votes

Given the following data frame:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['1a',np.nan,'10a','100b','0b'],
                   })
df

    A
0   1a
1   NaN
2   10a
3   100b
4   0b

I'd like to extract the numbers from each cell (where they exist). The desired result is:

    A
0   1
1   NaN
2   10
3   100
4   0

I know it can be done with str.extract, but I'm not sure how.

3

3 Answers

63
votes

Give it a regex capture group:

df.A.str.extract('(\d+)')

Gives you:

0      1
1    NaN
2     10
3    100
4      0
Name: A, dtype: object
4
votes

To answer @Steven G 's question in the comment above, this should work:

df.A.str.extract('(^\d*)')
1
votes

U can replace your column with your result using "assign" function:

df = df.assign(A = lambda x: x['A'].str.extract('(\d+)'))