how to remove all characters from string and leave numbers only in dataframe?

Question

i have couple columns in data frame that contains numeric values and string
and i want to remove all characters and leave only numbers

Admit_DX_Description            Primary_DX_Description
510.9 - EMPYEMA W/O FISTULA     510.9 - EMPYEMA W/O FISTULA
681.10 - CELLULITIS, TOE NOS    681.10 - CELLULITIS, TOE NOS
780.2 - SYNCOPE AND COLLAPSE    427.89 - CARDIAC DYSRHYTHMIAS NEC
729.5 - PAIN IN LIMB            998.30 - DISRUPTION OF WOUND, UNSPEC

to

Admit_DX_Description            Primary_DX_Description
510.9                             510.9 
681.10                            681.10 
780.2                             427.89 
729.5                             998.30

code:

  for col in strip_col:
       # # Encoding only categorical variables
       if df[col].dtypes =='object':
           df[col] = df[col].map(lambda x: x.rstrip(r'[a-zA-Z]'))

print df.head()

error:
Traceback (most recent call last):

df[col] = df[col].map(lambda x: x.rstrip(r'[a-zA-Z]'))

File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/series.py", line 2175, in map new_values = map_f(values, arg) File "pandas/src/inference.pyx", line 1217, in pandas.lib.map_infer (pandas/lib.c:63307)

df[col] = df[col].map(lambda x: x.rstrip(r'[a-zA-Z]'))

AttributeError: 'int' object has no attribute 'rstrip'

estebanpdl estebanpdl · Accepted Answer · 2017-02-03T21:38:01

You can use this example:

I chose re module to extract float numbers only.

import re
import pandas

df = pandas.DataFrame({'A': ['Hello 199.9', '19.99 Hello'], 'B': ['700.52 Test', 'Test 7.7']})

df
             A            B
0  Hello 199.9  700.52 Test
1  19.99 Hello     Test 7.7

for col in df:
    df[col] = [''.join(re.findall("\d+\.\d+", item)) for item in df[col]]

       A       B
0  199.9  700.52
1  19.99     7.7

If you have integer numbers also, change re pattern to this: \d*\.?\d+.

EDITED

For TypeError I'd recommend to use try. In this example I created a list errs. This list will be used in except TypeError. You can print (errs) to see those values.

Check df too.

...
...
errs = []
for col in df:
    try:
        df[col] = [''.join(re.findall("\d+\.\d+", item)) for item in df[col]]
    except TypeError:
        errs.extend([item for item in df[col]])

how to remove all characters from string and leave numbers only in dataframe?

2 Answers