1
votes

I am trying to extract the selected word & cross tab on a filtered dataset, using titantic dataset to illustrate.

train = pd.read_csv("d1.csv")
live= train[train['Survived']>0]# filter for survivors
print (live)
for live in live:
    live['Tt'] = live.Name.str.extract(' ([A-Za-z]+)\.', expand=False)
pd.crosstab(live['Tt'], live['Sex'])

I received an error : AttributeError: 'str' object has no attribute 'Name'

Checked back to the filtered dataset live, the 'Name' variable is present in the dataset.

Which part did I go wrong and how do I extract answers that show survivors only with sex and tt in a cross tab?

1

1 Answers

0
votes

Error cause

If you have a for-statement of the form for x in df:, then what you're basically saying is: loop through all column names of the dataframe df, and during each loop iteration assign the column name to variable x.

So let's have a look now at your for-loop:

for live in live:
    live['Tt'] = live.Name.str.extract(' ([A-Za-z]+)\.', expand=False)

Before these lines get executed variable live still contains a pandas dataframe. However, once inside the for-loop variable live has been assigned a string-object column name from the dataframe instead. Hence the error message that you're getting.

Solution

I think that you'll get your intended result if you simply eliminate the for-foop from your code, like so:

train = pd.read_csv("d1.csv")
live= train[train['Survived']>0]# filter for survivors
live['Tt'] = live.Name.str.extract(' ([A-Za-z]+)\.', expand=False)
pd.crosstab(live['Tt'], live['Sex'])