I have a dataframe in which for column 'pages' I need to count number of unique elements until there's an appearance of an element that contains the sub-string 'log in'. In case there's more than one element like this in the same list - I need to count until the first one.
input example:
site | pages |
---|---|
zoom.us | ['zoom.us/register', 'zoom.us/log_in/=?sdsd', 'zoom.us/log_in/=a3344'] |
zoom.us | ['zoom.us/about_us', 'zoom.us/error', 'zoom.us/help', 'zoom.us/log_in/jjjsl', 'zoom.us/log_in/llaye'] |
output example:
site | pages | unique_pages_before_log_in |
---|---|---|
zoom.us | ['zoom.us/register', 'zoom.us/register', 'zoom.us/log_in/=?sdsd', 'zoom.us/log_in/=a3344'] | 1 |
zoom.us | ['zoom.us/about_us', 'zoom.us/error', 'zoom.us/help', 'zoom.us/log_in/jjjsl', 'zoom.us/log_in/llaye'] | 3 |
I thought about using set to count unique values, but I don't know how to count only until the first 'log in' sub-string appears. something like this:
df['unique_pages_before_login'] = df['pages'].apply(lambda l: len(set(l[:l.index('zoom.us/log_in')])))
I will appreciate any help :)
"zoom.us/register"
at the beginning – aaossa