Code Golf: Duplicate Character Removal in String

30

votes

The challenge: The shortest code, by character count, that detects and removes duplicate characters in a String. Removal includes ALL instances of the duplicated character (so if you find 3 n's, all three have to go), and original character order needs to be preserved.

Example Input 1:
nbHHkRvrXbvkn

Example Output 1:
RrX

Example Input 2:
nbHHkRbvnrXbvkn

Example Output 2:
RrX

(the second example removes letters that occur three times; some solutions have failed to account for this)

(This is based on my other question where I needed the fastest way to do this in C#, but I think it makes good Code Golf across languages.)

code-golf

There's definitely going to be a Perl solution here under 10/15 chars. – Noldorin

where's the 4 character J solution? – Jason

Should solutions be a complete program that accepts input from the user, or just a function/method that performs the given task? – SingleNegationElimination

If it was just a function, it could be shorter in most cases. +1 for full programs. – nilamo

It should be a program, but if someone posts the function by itself it's fine too as it still adds interesting value. – Alex

38

votes

LabVIEW 7.1

ONE character and that is the blue constant '1' in the block diagram. I swear, the input was copy and paste ;-)

http://i25.tinypic.com/hvc4mp.png

http://i26.tinypic.com/5pnas.png

26

votes

Perl

21 characters of perl, 31 to invoke, 36 total keystrokes (counting shift and final return):

perl -pe's/$1//gwhile/(.).*\1/'

19

votes

Ruby — 61 53 51 56 35

61 chars, the ruler says. (Gives me an idea for another code golf...)

  puts ((i=gets.split(''))-i.select{|c|i.to_s.count(c)<2}).join
+-------------------------------------------------------------------------+
||    |    |    |    |    |    |    |    |    |    |    |    |    |    |  |
|0         10        20        30        40        50        60        70 |
|                                                                         |
+-------------------------------------------------------------------------+
  gets.chars{|c|$><<c[$_.count(c)-1]}

... 35 by Nakilon

18

votes

APL

23 characters:

(((1+ρx)-(ϕx)ιx)=xιx)/x

I'm an APL newbie (learned it yesterday), so be kind -- this is certainly not the most efficient way to do it. I'm ashamed I didn't beat Perl by very much.

Then again, maybe it says something when the most natural way for a newbie to solve this problem in APL was still more concise than any other solution in any language so far.

15

votes

Python:

s=raw_input()
print filter(lambda c:s.count(c)<2,s)

This is a complete working program, reading from and writing to the console. The one-liner version can be directly used from the command line

python -c 's=raw_input();print filter(lambda c:s.count(c)<2,s)'

14

votes

J (16 12 characters)

(~.{~[:I.1=#/.~)

Example:

(~.{~[:I.1=#/.~) 'nbHHkRvrXbvkn'
    RrX

It only needs the parenthesis to be executed tacitly. If put in a verb, the actual code itself would be 14 characters.

There certainly are smarter ways to do this.

EDIT: The smarter way in question:

(~.#~1=#/.~) 'nbHHkRvrXbvkn'
    RrX

12 characters, only 10 if set in a verb. I still hate the fact that it's going through the list twice, once to count (#/.) and another to return uniques (nub or ~.), but even nubcount, a standard verb in the 'misc' library does it twice.

12

votes

Haskell

There's surely shorter ways to do this in Haskell, but:

Prelude Data.List> let h y=[x|x<-y,(<2).length$filter(==x)y]
Prelude Data.List> h "nbHHkRvrXbvkn"
"RrX"

Ignoring the let, since it's only required for function declarations in GHCi, we have h y=[x|x<-y,(<2).length$filter(==x)y], which is 37 characters (this ties the current "core" Python of "".join(c for c in s if s.count(c)<2), and it's virtually the same code anyway).

If you want to make a whole program out of it,

h y=[x|x<-y,(<2).length$filter(==x)y]
main=interact h

$ echo "nbHHkRvrXbvkn" | runghc tmp.hs
RrX

$ wc -c tmp.hs
54 tmp.hs

Or we can knock off one character this way:

main=interact(\y->[x|x<-y,(<2).length$filter(==x)y])

$ echo "nbHHkRvrXbvkn" | runghc tmp2.hs
RrX

$ wc -c tmp2.hs
53 tmp2.hs

It operates on all of stdin, not line-by-line, but that seems acceptable IMO.

9

votes

C89 (106 characters)

This one uses a completely different method than my original answer. Interestingly, after writing it and then looking at another answer, I saw the methods were very similar. Credits to caf for coming up with this method before me.

b[256];l;x;main(c){while((c=getchar())>=0)b[c]=b[c]?1:--l;
for(;x-->l;)for(c=256;c;)b[--c]-x?0:putchar(c);}

On one line, it's 58+48 = 106 bytes.

C89 (173 characters)

This was my original answer. As said in the comments, it doesn't work too well...

#include<stdio.h>
main(l,s){char*b,*d;for(b=l=s=0;l==s;s+=fread(b+s,1,9,stdin))b=realloc(b,l+=9)
;d=b;for(l=0;l<s;++d)if(!memchr(b,*d,l)&!memchr(d+1,*d,s-l++-1))putchar(*d);}

On two lines, it's 17+1+78+77 = 173 bytes.

8

votes

C#

65 Characters:

new String(h.Where(x=>h.IndexOf(x)==h.LastIndexOf(x)).ToArray());

67 Characters with reassignment:

h=new String(h.Where(x=>h.IndexOf(x)==h.LastIndexOf(x)).ToArray());

7

votes

C#

new string(input.GroupBy(c => c).Where(g => g.Count() == 1).ToArray());

71 characters

6

votes

PHP (136 characters)

<?PHP
function q($x){return $x<2;}echo implode(array_keys(array_filter(
array_count_values(str_split(stream_get_contents(STDIN))),'q')));

On one line, it's 5+1+65+65 = 136 bytes. Using PHP 5.3 you could save a few bytes making the function anonymous, but I can't test that now. Perhaps something like:

<?PHP
echo implode(array_keys(array_filter(array_count_values(str_split(
stream_get_contents(STDIN))),function($x){return $x<2;})));

That's 5+1+66+59 = 131 bytes.

6

votes

another APL solution

As a dynamic function (18 charachters)

{(1+=/¨(ω∘∊¨ω))/ω}

line assuming that input is in variable x (16 characters):

(1+=/¨(x∘∊¨x))/x

5

votes

VB.NET

For Each c In s : s = IIf(s.LastIndexOf(c) <> s.IndexOf(c), s.Replace(CStr(c), Nothing), s) : Next

Granted, VB is not the optimal language to try to save characters, but the line comes out to 98 characters.

5

votes

PowerShell

61 characters. Where $s="nbHHkRvrXbvkn" and $a is the result.

$h=@{}
($c=[char[]]$s)|%{$h[$_]++}
$c|%{if($h[$_]-eq1){$a+=$_}}

Fully functioning parameterized script:

param($s)
$h=@{}
($c=[char[]]$s)|%{$h[$_]++}
$c|%{if($h[$_]-eq1){$a+=$_}}
$a

5

votes

C: 83 89 93 99 101 characters

O(n²) time.
Limited to 999 characters.
Only works in 32-bit mode (due to not #include-ing <stdio.h> (costs 18 chars) making the return type of gets being interpreted as an int and chopping off half of the address bits).
Shows a friendly "warning: this program uses gets(), which is unsafe." on Macs.

.

main(){char s[999],*c=gets(s);for(;*c;c++)strchr(s,*c)-strrchr(s,*c)||putchar(*c);}

(and this similar 82-chars version takes input via the command line:

main(char*c,char**S){for(c=*++S;*c;c++)strchr(*S,*c)-strrchr(*S,*c)||putchar(*c);}

)

5

votes

Golfscript(sym) - 15

  .`{\{=}+,,(!}+,
+-------------------------------------------------------------------------+
||    |    |    |    |    |    |    |    |    |    |    |    |    |    |  |
|0         10        20        30        40        50        60        70 |
|                                                                         |
+-------------------------------------------------------------------------+

4

votes

Haskell

(just knocking a few characters off Mark Rushakoff's effort, I'd rather it was posted as a comment on his)

h y=[x|x<-y,[_]<-[filter(==x)y]]

which is better Haskell idiom but maybe harder to follow for non-Haskellers than this:

h y=[z|x<-y,[z]<-[filter(==x)y]]

Edit to add an explanation for hiena and others:

I'll assume you understand Mark's version, so I'll just cover the change. Mark's expression:

(<2).length $ filter (==x) y

filters y to get the list of elements that == x, finds the length of that list and makes sure it's less than two. (in fact it must be length one, but ==1 is longer than <2 ) My version:

[z] <- [filter(==x)y]

does the same filter, then puts the resulting list into a list as the only element. Now the arrow (meant to look like set inclusion!) says "for every element of the RHS list in turn, call that element [z]". [z] is the list containing the single element z, so the element "filter(==x)y" can only be called "[z]" if it contains exactly one element. Otherwise it gets discarded and is never used as a value of z. So the z's (which are returned on the left of the | in the list comprehension) are exactly the x's that make the filter return a list of length one.

That was my second version, my first version returns x instead of z - because they're the same anyway - and renames z to _ which is the Haskell symbol for "this value isn't going to be used so I'm not going to complicate my code by giving it a name".

3

votes

Javascript 1.8

s.split('').filter(function (o,i,a) a.filter(function(p) o===p).length <2 ).join('');

or alternately- similar to the python example:

[s[c] for (c in s) if (s.split("").filter(function(p) s[c]===p).length <2)].join('');

3

votes

TCL

123 chars. It might be possible to get it shorter, but this is good enough for me.

proc h {i {r {}}} {foreach c [split $i {}] {if {[llength [split $i $c]]==2} {set r $r$c}}
return $r}
puts [h [gets stdin]]

3

votes

C

Full program in C, 141 bytes (counting newlines).

#include<stdio.h>
c,n[256],o,i=1;main(){for(;c-EOF;c=getchar())c-EOF?n[c]=n[c]?-1:o++:0;for(;i<o;i++)for(c=0;c<256;c++)n[c]-i?0:putchar(c);}

3

votes

Scala

54 chars for the method body only, 66 with (statically typed) method declaration:

def s(s:String)=(""/:s)((a,b)=>if(s.filter(c=>c==b).size>1)a else a+b)

3

votes

Ruby

63 chars.

puts (t=gets.split(//)).map{|i|t.count(i)>1?nil:i}.compact.join

3

votes

VB.NET / LINQ

96 characters for complete working statement

Dim p=New String((From c In"nbHHkRvrXbvkn"Group c By c Into i=Count Where i=1 Select c).ToArray)

Complete working statement, with original string and the VB Specific "Pretty listing (reformatting of code" turned off, at 96 characters, non-working statement without original string at 84 characters.

(Please make sure your code works before answering. Thank you.)

3

votes

C

(1st version: 112 characters; 2nd version: 107 characters)

k[256],o[100000],p,c;main(){while((c=getchar())!=-1)++k[o[p++]=c];for(c=0;c<p;c++)if(k[o[c]]==1)putchar(o[c]);}

That's

/* #include <stdio.h> */
/* int */ k[256], o[100000], p, c;
/* int */ main(/* void */) {
  while((c=getchar()) != -1/*EOF*/) {
    ++k[o[p++] = /*(unsigned char)*/c];
  }
  for(c=0; c<p; c++) {
    if(k[o[c]] == 1) {
      putchar(o[c]);
    }
  }
  /* return 0; */
}

Because getchar() returns int and putchar accepts int, the #include can 'safely' be removed. Without the include, EOF is not defined, so I used -1 instead (and gained a char). This program only works as intended for inputs with less than 100000 characters!

Version 2, with thanks to strager 107 characters

#ifdef NICE_LAYOUT
#include <stdio.h>

/* global variables are initialized to 0 */
int char_count[256];                          /* k in the other layout */
int char_order[999999];                       /* o ... */
int char_index;                               /* p  */

int main(int ch_n_loop, char **dummy)         /* c  */
                                              /* variable with 2 uses */
{

  (void)dummy; /* make warning about unused variable go away */

  while ((ch_n_loop = getchar()) >= 0) /* EOF is, by definition, negative */
  {
    ++char_count[ ( char_order[char_index++] = ch_n_loop ) ];
    /* assignment, and increment, inside the array index */
  }
  /* reuse ch_n_loop */
  for (ch_n_loop = 0; ch_n_loop < char_index; ch_n_loop++) {
    (char_count[char_order[ch_n_loop]] - 1) ? 0 : putchar(char_order[ch_n_loop]);
  }
  return 0;
}
#else
k[256],o[999999],p;main(c){while((c=getchar())>=0)++k[o[p++]=c];for(c=0;c<p;c++)k[o[c]]-1?0:putchar(o[c]);}
#endif

3

votes

Javascript 1.6

s.match(/(.)(?=.*\1)/g).map(function(m){s=s.replace(RegExp(m,'g'),'')})

Shorter than the previously posted Javascript 1.8 solution (71 chars vs 85)

3

votes

Assembler

Tested with WinXP DOS box (cmd.exe):

    xchg cx,bp
    std
    mov al,2
    rep stosb
    inc cl
l0: ; to save a byte, I've encoded the instruction to exit the program into the
    ; low byte of the offset in the following instruction:
    lea si,[di+01c3h] 
    push si
l1: mov dx,bp
    mov ah,6
    int 21h
    jz l2
    mov bl,al
    shr byte ptr [di+bx],cl
    jz l1
    inc si
    mov [si],bx
    jmp l1
l2: pop si
l3: inc si
    mov bl,[si]
    cmp bl,bh
    je l0+2
    cmp [di+bx],cl
    jne l3
    mov dl,bl
    mov ah,2
    int 21h
    jmp l3

Assembles to 53 bytes. Reads standard input and writes results to standard output, eg:

 programname < input > output

2

votes

PHP

118 characters actual code (plus 6 characters for the PHP block tag):

<?php
$s=trim(fgets(STDIN));$x='';while(strlen($s)){$t=str_replace($s[0],'',substr($s,1),$c);$x.=$c?'':$s[0];$s=$t;}echo$x;

2

votes

C# (53 Characters)

Where s is your input string:

new string(s.Where(c=>s.Count(h=>h==c)<2).ToArray());

Or 59 with re-assignment:

var a=new string(s.Where(c=>s.Count(h=>h==c)<2).ToArray());

2

votes

Haskell Pointfree

import Data.List
import Control.Monad
import Control.Arrow
main=interact$liftM2(\\)nub$ap(\\)nub

The whole program is 97 characters, but the real meat is just 23 characters. The rest is just imports and bringing the function into the IO monad. In ghci with the modules loaded it's just

(liftM2(\\)nub$ap(\\)nub) "nbHHkRvrXbvkn"

In even more ridiculous pointfree style (pointless style?):

main=interact$liftM2 ap liftM2 ap(\\)nub

It's a bit longer though at 26 chars for the function itself.

2

votes

Shell/Coreutils, 37 Characters

fold -w1|sort|uniq -u|paste -s -d ''

Code Golf: Duplicate Character Removal in String

30 Answers

LabVIEW 7.1

Perl

Ruby — 61 53 51 56 35

APL

Python:

Haskell

C89 (106 characters)

C89 (173 characters)

C#

C#

PHP (136 characters)

another APL solution

VB.NET

PowerShell

C: 83 89 93 99 101 characters

Golfscript(sym) - 15

Javascript 1.8

TCL

C

Scala

Ruby

VB.NET / LINQ

Javascript 1.6

PHP

C# (53 Characters)

Haskell Pointfree

Shell/Coreutils, 37 Characters