0
votes

I am trying to categorized "DistAreaID" column of my dataset [cf_all] by grouping to its mean wrt Y.

Code:

round(cf_all.groupby("DistArea_ID")["Counterfeit_Sales"].mean(), 2)

for col in range(len(cf_all)):
    if cf_all["DistArea_ID"][col] in \
        ["Area013", "Area017", "Area018", "Area035", "Area045", "Area046", "Area049"]:
        cf_all.loc[col, "DistArea_ID"] = "DistArea_2000"
    if cf_all["DistArea_ID"][col] in ["Area010", "Area019"]:
        cf_all.loc[col, "DistArea_ID"] = "DistArea_400"
    if cf_all["DistArea_ID"][col] in ["Area027"]:
        cf_all.loc[col, "DistArea_ID"] = "DistArea_3000"

Error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(),
a.item(), a.any() or a.all().

Can someone please guide me with this error?

1
Can you paste 4-5 lines of sample data for cf_all?shaik moeed
Try this (cf_all["DistArea_ID"] [col] in ["Area013","Area017","Area018","Area035","Area045","Area046","Area049"]).any()shaik moeed

1 Answers

1
votes

I suggest use Series.replace or Series.map with Series.fillna:

d = {"DistArea_2000": ["Area013", "Area017", "Area018", 
                       "Area035", "Area045", "Area046", "Area049"],
     "DistArea_400": ["Area010", "Area019"],
     "DistArea_3000":["Area027"]}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}

cf_all["DistArea_ID"] = cf_all["DistArea_ID"].replace(d1)

#obviously faster 
#cf_all["DistArea_ID"] = cf_all["DistArea_ID"].map(d1).fillna(cf_all["DistArea_ID"])