26
votes

Lets say I have a dataframe df as

A B
1 V2
3 W42
1 S03
2 T02
3 U71

I want to have a new column (either at it the end of df or replace column B with it, as it doesn't matter) that only extracts the int from the column B. That is I want column C to look like

C
2
42
3
2
71

So if there is a 0 in front of the number, such as for 03, then I want to return 3 not 03

How can I do this?

6

6 Answers

79
votes

You can convert to string and extract the integer using regular expressions.

df['B'].str.extract('(\d+)').astype(int)
3
votes

Assuming there is always exactly one leading letter

df['B'] = df['B'].str[1:].astype(int)
0
votes

I wrote a little loop to do this , as I didn't have my strings in a DataFrame, but in a list. This way, you can also add a little if statement to account for floats :

output= ''
input = 'whatever.007'  

for letter in input :
        try :
            int(letter)
            output += letter

        except ValueError :
                pass

        if letter == '.' :
            output += letter

output = float(output)

or you can int(output) if you like.

0
votes

Preparing the DF to have the same one as yours:

df = pd.DataFrame({'A': [1, 3, 1, 2, 3], 'B' : ['V2', 'W42', 'S03', 'T02', 'U71']})

df.head()

Now Manipulate it to get your desired outcome:

df['C'] = df['B'].apply(lambda x: re.search(r'\d+', x).group())

df.head()


    A   B   C
0   1   V2  2
1   3   W42 42
2   1   S03 03
3   2   T02 02
4   3   U71 71
0
votes

This is another way of doing it if you don't want to use regualr expressions: I used map() function to apply what is needed on each element of the column. So like this:

letters = "abcdefghijklmnopqrstuvwxyz"
df['C'] = list(map(lambda x: int(x.lower().strip(letters))   ,  df['B']))

Output will be like this:

enter image description here

0
votes

First set up the data

df = pd.DataFrame({'A': [1, 3, 1, 2, 3], 'B' : ['V2', 'W42', 'S03', 'T02', 'U71']})

df.head()

Then do the extraction and cast it back to ints

df['C'] = df['B'].str.extract('(\d+)').astype(int)

df.head()