Filtering country to apply different stopwords

Question

I have the following dataset

                                   Text
country     file                          
US          file_US                The Dish: Lidia Bastianich shares Italian recipes ... - CBS News
            file_US                Blog - Tasty Yummies
            file_US                Acne Alternative Remedies: Manuka Honey, Tea Tree Oil ...
            file_US                Looking back at 10 years of Downtown Arts | Times Leader 

IT          filename_IT            Tornando indietro a ...
            filename_IT            Questo locale è molto consigliato per le famiglie
                                                                            ...                                 
            filename_IT            Ci si chiede dove poter andare a mangiare una pizza  Melanzana Capriccia ...
            filename_IT            Ideale per chi ama mangiare vegano

with country and file indices. I want to apply a function which remove stopwords based on the value of the index:

def removing(sent):
    
    if df.loc['US','UK']:
        stop_words = stopwords.words('english')
    if df.loc['ES']:
        stop_words = stopwords.words('spanish')    
    
# (and so on)
                      
    c_text = []

    for i in sent.lower().split():
        if i not in stop_words:
            c_text.append(i)

    return(' '.join(c_text))

df['New_Column'] = df['Text'].astype(str)
df['New_Column'] = df['New_Column'].apply(removing)

Unfortunately I am getting this error:

----> 6 if df.loc['US']: 7 stop_words = stopwords.words('english') 8 if df.loc['ES']:

/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in nonzero(self) 1477 def nonzero(self): 1478 raise ValueError( -> 1479 f"The truth value of a {type(self).name} is ambiguous. " 1480 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." 1481 )

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

and I am still not understanding how to fix it. Can you please tell me how I can run the code without getting the error?

Does this answer your question? Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() — AMC
Some people just leave just downvote my answer without left a single word , so I will remove it. Hope you get the Idea not use for loop when you have panda and numpy — BENY
@still_learning I know , no problem , hope you already get the method np.where ~ — BENY
@still_learning 1st that is not his answer , 2nd your problem is different from what he linked — BENY

Mehul Gupta Mehul Gupta · Accepted Answer · 2020-06-30T04:41:23

#Assuming you have imported all the required libraries
#Make a dictionary with country code & language
lang={'UK':'english','US':'english','ES':'spanish'}
#assuming your dataframe as df
for index,row in df.iterrows():
   df.loc[index,'TEXT']=' '.join([word for word in str(row['TEXT']).split(' ') if word not in stopwords.words(lang[index])])

Updated answer:

 import pandas as pd
 import numpy as np
 import nltk
 from nltk.corpus import stopwords
 ind=pd.MultiIndex.from_arrays([['ind','ind','ind','ind','aus','aus','aus','aus'], ['1','2','3','4','5','6','7','8']])
 df=pd.DataFrame(['he is boy','he is boy','he is boy','he is boy','he is boy','he is boy','he is boy','he is boy'],index=ind,columns=['text'])
 lang={'ind':'spanish','aus':'english'}
 for index,row in df.iterrows():
       df.at[(index[0],index[1]),'text']=' '.join([word for word in str(row['text']).split(' ') if word not in stopwords.words(lang[index[0]])])

Before running loop:

After running loop:

Do try to take reference from the example I used!!

Filtering country to apply different stopwords

3 Answers