Find number of unique elements in a list column in DataFrame

Question

I have a dataframe in which for column 'pages' I need to count number of unique elements until there's an appearance of an element that contains the sub-string 'log in'. In case there's more than one element like this in the same list - I need to count until the first one.

input example:

site	pages
zoom.us	['zoom.us/register', 'zoom.us/log_in/=?sdsd', 'zoom.us/log_in/=a3344']
zoom.us	['zoom.us/about_us', 'zoom.us/error', 'zoom.us/help', 'zoom.us/log_in/jjjsl', 'zoom.us/log_in/llaye']

output example:

site	pages	unique_pages_before_log_in
zoom.us	['zoom.us/register', 'zoom.us/register', 'zoom.us/log_in/=?sdsd', 'zoom.us/log_in/=a3344']	1
zoom.us	['zoom.us/about_us', 'zoom.us/error', 'zoom.us/help', 'zoom.us/log_in/jjjsl', 'zoom.us/log_in/llaye']	3

I thought about using set to count unique values, but I don't know how to count only until the first 'log in' sub-string appears. something like this:

df['unique_pages_before_login'] = df['pages'].apply(lambda l: len(set(l[:l.index('zoom.us/log_in')])))

I will appreciate any help :)

There seems to be an inconsistency in your sample input and output. Seems like the first row of your input is missing a "zoom.us/register" at the beginning — aaossa

Pranav Hosangadi Pranav Hosangadi · Accepted Answer · 2022-02-28T16:37:00

Looks like you have to use .apply() here. One approach is to add each element you find to a set until you find one that contains your search string. When you do find this, return the size of the set you've created.

def count_unique_before_login(pages):
    c = set()
    for item in pages:
        if "log_in" in item: return len(c)
        c.add(item)
    return None # No log_in found


df = {'site': {0: 'zoom.us', 1: 'zoom.us'},
 'pages': {0: ['zoom.us/register',
   'zoom.us/log_in/=?sdsd',
   'zoom.us/log_in/=a3344'],
  1: ['zoom.us/about_us',
   'zoom.us/error',
   'zoom.us/help',
   'zoom.us/log_in/jjjsl',
   'zoom.us/log_in/llaye']}}

df["unique_pages_before_log_in"] = df["pages"].apply(count_unique_before_login)

Which gives:

      site  ... unique_pages_before_log_in
0  zoom.us  ...                          1
1  zoom.us  ...                          3

Find number of unique elements in a list column in DataFrame

4 Answers