57
votes

How can I remove duplicate characters from a string using Python? For example, let's say I have a string:

foo = 'mppmt'

How can I make the string:

foo = 'mpt'

NOTE: Order is not important

14
@AljoshaBre - use the 'close' button and select close as dupe and supply that link. Thank you - Martin Beckett
@AljoshaBre None of those answers are guaranteed to maintain order. - Marcin
The link is actually already there. Just 4 clicks. - ulidtko

14 Answers

122
votes

If order does not matter, you can use

"".join(set(foo))

set() will create a set of unique letters in the string, and "".join() will join the letters back to a string in arbitrary order.

If order does matter, you can use a dict instead of a set, which since Python 3.7 preserves the insertion order of the keys. (In the CPython implementation, this is already supported in Python 3.6 as an implementation detail.)

foo = "mppmt"
result = "".join(dict.fromkeys(foo))

resulting in the string "mpt". In earlier versions of Python, you can use collections.OrderedDict, which has been available starting from Python 2.7.

43
votes

If order does matter, how about:

>>> foo = 'mppmt'
>>> ''.join(sorted(set(foo), key=foo.index))
'mpt'
10
votes

If order is not the matter:

>>> foo='mppmt'
>>> ''.join(set(foo))
'pmt'

To keep the order:

>>> foo='mppmt'
>>> ''.join([j for i,j in enumerate(foo) if j not in foo[:i]])
'mpt'
4
votes

Create a list in Python and also a set which doesn't allow any duplicates. Solution1 :

def fix(string):
    s = set()
    list = []
    for ch in string:
        if ch not in s:
            s.add(ch)
            list.append(ch)

    return ''.join(list)        

string = "Protiijaayiiii"
print(fix(string))

Method 2 :

s = "Protijayi"

aa = [ ch  for i, ch in enumerate(s) if ch not in s[:i]]
print(''.join(aa))
3
votes

As was mentioned "".join(set(foo)) and collections.OrderedDict will do. A added foo = foo.lower() in case the string has upper and lower case characters and you need to remove ALL duplicates no matter if they're upper or lower characters.

from collections import OrderedDict
foo = "EugeneEhGhsnaWW"
foo = foo.lower()
print "".join(OrderedDict.fromkeys(foo))

prints eugnhsaw

3
votes
#Check code and apply in your Program:

#Input= 'pppmm'    
s = 'ppppmm'
s = ''.join(set(s))  
print(s)
#Output: pm
2
votes

If order is important,

seen = set()
result = []
for c in foo:
    if c not in seen:
        result.append(c)
        seen.add(c)
result = ''.join(result)

Or to do it without sets:

result = []
for c in foo:
    if c not in result:
        result.append(c)
result = ''.join(result)
2
votes
def dupe(str1):
    s=set(str1)

    return "".join(s)
str1='geeksforgeeks'
a=dupe(str1)
print(a)

works well if order is not important.

2
votes
d = {}
s="YOUR_DESIRED_STRING"
res=[]
for c in s:
    if c not in d:
      res.append(c)
      d[c]=1
print ("".join(res))

variable 'c' traverses through String 's' in the for loop and is checked if c is in a set d (which initially has no element) and if c is not in d, c is appended to the character array 'res' then the index c of set d is changed to 1. after the loop is exited i.e c finishes traversing through the string to store unique elements in set d, the resultant res which has all unique characters is printed.

0
votes
def remove_duplicates(value):
    var=""
    for i in value:
        if i in value:
            if i in var:
                pass
            else:
                var=var+i
    return var

print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))
0
votes

As string is a list of characters, converting it to dictionary will remove all duplicates and will retain the order.

"".join(list(dict.fromkeys(foo)))

0
votes
from collections import OrderedDict
def remove_duplicates(value):
        m=list(OrderedDict.fromkeys(value))
        s=''
        for i in m:
            s+=i
        return s
print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))

0
votes
 mylist=["ABA", "CAA", "ADA"]
 results=[]
 for item in mylist:
     buffer=[]
     for char in item:
         if char not in buffer:
             buffer.append(char)
     results.append("".join(buffer))
    
 print(results)

 output
 ABA
 CAA
 ADA
 ['AB', 'CA', 'AD']
0
votes

Functional programming style while keeping order:

import functools

def get_unique_char(a, b):
    if b not in a:
        return a + b
    else:
        return a

if __name__ == '__main__':
    foo = 'mppmt'

    gen = functools.reduce(get_unique_char, foo)
    print(''.join(list(gen)))