0
votes

I have a data frame with a column for 'genre' with strings like 'drama, comedy, action'.

I want to split the elements like this 'drama', 'comedy', 'action' so I've used;

Genre=[]

for genre_type in books['genre'].astype('str'):
    Genre.append(genre_type.split(','))
    
genre['genres_1']=genres_1

but, the result contains spaces between genres (other than the first one listed) like 'drama','_comedy','_action'. (I used an underscore to represent the space because otherwise it's hard to see).

so I tried

Genre_clean=[]
for x in books['genres_1'].astype('str'):
    Genre_clean.append(x.strip(' '))
Genre_clean

but the space remains, what am I doing wrong?

my full code is below;

import pandas as pd

# Creating sample dataframes
books = pd.DataFrame()
books['genre']=['drama, comedy, action', 'romance, sci-fi, drama','horror']

# Splitting genre
Genre=[]
for genre_type in books['genre'].astype('str'):
    Genre.append(genre_type.split(','))
    
books['genres_1']=Genre

# trying to remove the space
Genre_clean=[]
for x in books['genres_1'].astype('str'):
    Genre_clean.append(x.strip(' '))
Genre_clean
1
See my answer for a far better way to do this. As for what you're doing wrong, strip only removes spaces from the beginning and ends of strings, what you have after your first step is a string representation of a list (books['genres_1'].astype('str')), there aren't any outer spaces to remove from "['romance', 'sci-fi', 'drama']"...BeRT2me

1 Answers

2
votes

Don't use traditional loops/list comprehension for pandas. Look up the equivalent, far more efficient, pandas specific function for whatever you want to do. Otherwise, there's no reason to use pandas.

See: pandas str functions

books = pd.DataFrame()
books['genre']=['drama, comedy, action', 'romance, sci-fi, drama','horror']

books.genre = books.genre.str.split(', ')
print(books)

Output:

                      genre
0   [drama, comedy, action]
1  [romance, sci-fi, drama]
2                  [horror]

If you want this as a string, you can join the list again with:

books.genre = books.genre.str.join(',')
    # Or, all at once:
# books.genre = books.genre.str.split(', ').str.join(',')
    # Or, just replace spaces with nothing:
# books.genre = books.genre.str.replace(' ', '')
print(books)

# Output:

                  genre
0   drama,comedy,action
1  romance,sci-fi,drama
2                horror