17
votes

(Edit: What is Code Golf: Code Golf are challenges to solve a specific problem with the shortest amount of code by character count in whichever language you prefer. More info here on Meta StackOverflow. )

Code Golfers, here's a challenge on string operations.

Email Address Validation, but without regular expressions (or similar parsing library) of course. It's not so much about the email addresses but how short you can write the different string operations and constraints given below.

The rules are the following (yes, I know, this is not RFC compliant, but these are going to be the 5 rules for this challenge):

  • At least 1 character out of this group before the @:

    A-Z, a-z, 0-9, . (period), _ (underscore)
    
  • @ has to exist, exactly one time

    [email protected]
        ^
    
  • Period (.) has to exist exactly one time after the @

    [email protected]
              ^
    
  • At least 1 only [A-Z, a-z] character between @ and the following . (period)

    [email protected]
         ^
    
  • At least 2 only [A-Z, a-z] characters after the final . period

    [email protected]
               ^^
    

Please post the method/function only, which would take a string (proposed email address) and then return a Boolean result (true/false) depending on the email address being valid (true) or invalid (false).

Samples:
[email protected]    (valid/true)          @w.org     (invalid/false)    
b@[email protected]  (invalid/false)       test@org   (invalid/false)    
test@%.org (invalid/false)       s%[email protected]  (invalid/false)    
[email protected] (invalid/false)       [email protected]  (valid/true)
[email protected]  (valid/true)          foo@a%.com (invalid/false)

Good luck!

15
Too many [code-golf]s of late. If this continues I will relucatnatly join Pax and start voting to close. - dmckee --- ex-moderator kitten
dmckee: What, is stackoverflow running out of question numbers? - caf
@caf: Like best [joke|comic|...] and similar questions these lie outside the remit of SO. That is not a problem as long as they are rare. Indeed, they serve as diversions and provide a sense of community. But if they grow too common they will give new-comers the wrong impression about the culture and purpose of the site; they will dive the appearance of a lot of drivel. Which is a shame, because I like code golf, enjoy playing with some of the problems that come up, and am quite proud of some on my entries. - dmckee --- ex-moderator kitten
@dmckee - While I'm a fan of code golf, I'm inclined to agree that we're seeing a deluge of golf questions recently. - Chris Lutz
[code-golf] questions are being discussed on meta.stackexchange.com/questions/20912/so-weekly-code-golf - Brad Gilbert

15 Answers

20
votes

C89 (166 characters)

#define B(c)isalnum(c)|c==46|c==95
#define C(x)if(!v|*i++-x)return!1;
#define D(x)for(v=0;x(*i);++i)++v;
v;e(char*i){D(B)C(64)D(isalpha)C(46)D(isalpha)return!*i&v>1;}

Not re-entrant, but can be run multiple times. Test bed:

#include<stdio.h>
#include<assert.h>
main(){
    assert(e("[email protected]"));
    assert(e("[email protected]"));
    assert(e("[email protected]"));
    assert(!e("b@[email protected]"));
    assert(!e("test@%.org"));
    assert(!e("[email protected]"));
    assert(!e("@w.org"));
    assert(!e("test@org"));
    assert(!e("s%[email protected]"));
    assert(!e("foo@a%.com"));
    puts("success!");
}
12
votes

J

:[[/%^(:[[+-/^,&i|:[$[' ']^j+0__:k<3:]]
6
votes

C89, 175 characters.

#define G &&*((a+=t+1)-1)==
#define H (t=strspn(a,A
t;e(char*a){char A[66]="_.0123456789Aa";short*s=A+12;for(;++s<A+64;)*s=s[-1]+257;return H))G 64&&H+12))G 46&&H+12))>1 G 0;}

I am using the standard library function strspn(), so I feel this answer isn't as "clean" as strager's answer which does without any library functions. (I also stole his idea of declaring a global variable without a type!)

One of the tricks here is that by putting . and _ at the start of the string A, it's possible to include or exclude them easily in a strspn() test: when you want to allow them, use strspn(something, A); when you don't, use strspn(something, A+12). Another is assuming that sizeof (short) == 2 * sizeof (char), and building up the array of valid characters 2 at a time from the "seed" pair Aa. The rest was just looking for a way to force subexpressions to look similar enough that they could be pulled out into #defined macros.

To make this code more "portable" (heh :-P) you can change the array-building code from

char A[66]="_.0123456789Aa";short*s=A+12;for(;++s<A+64;)*s=s[-1]+257;

to

char*A="_.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

for a cost of 5 additional characters.

5
votes

Python (181 characters including newlines)

def v(E):
 import string as t;a=t.ascii_letters;e=a+"1234567890_.";t=e,e,"@",e,".",a,a,a,a,a,"",a
 for c in E:
  if c in t[0]:t=t[2:]
  elif not c in t[1]:return 0>1
 return""==t[0]

Basically just a state machine using obfuscatingly short variable names.

5
votes

C (166 characters)

#define F(t,u)for(r=s;t=(*s-64?*s-46?isalpha(*s)?3:isdigit(*s)|*s==95?4:0:2:1);++s);if(s-r-1 u)return 0;
V(char*s){char*r;F(2<,<0)F(1=)F(3=,<0)F(2=)F(3=,<1)return 1;}

The single newline is required, and I've counted it as one character.

4
votes

Python, 149 chars (after putting the whole for loop into one semicolon-separated line, which I haven't done here for "readability" purposes):

def v(s,t=0,o=1):
 for c in s:
   k=c=="@"
   p=c=="."
   A=c.isalnum()|p|(c=="_")
   L=c.isalpha()
   o&=[A,k|A,L,L|p,L,L,L][t]
   t+=[1,k,1,p,1,1,0][t]
 return(t>5)&o

Test cases, borrowed from strager's answer:

assert v("[email protected]")
assert v("[email protected]")
assert v("[email protected]")
assert not v("b@[email protected]")
assert not v("test@%.org")
assert not v("[email protected]")
assert not v("@w.org")
assert not v("test@org")
assert not v("s%[email protected]")
assert not v("foo@a%.com")
print "Yeah!"

Explanation: When iterating over the string, two variables keep getting updated.

t keeps the current state:

  • t = 0: We're at the beginning.
  • t = 1: We where at the beginning and have found at least one legal character (letter, number, underscore, period)
  • t = 2: We have found the "@"
  • t = 3: We have found at least on legal character (i.e. letter) after the "@"
  • t = 4: We have found the period in the domain name
  • t = 5: We have found one legal character (letter) after the period
  • t = 6: We have found at least two legal characters after the period

o as in "okay" starts as 1, i.e. true, and is set to 0 as soon as a character is found that is illegal in the current state. Legal characters are:

  • In state 0: letter, number, underscore, period (change state to 1 in any case)
  • In state 1: letter, number, underscore, period, at-sign (change state to 2 if "@" is found)
  • In state 2: letter (change state to 3)
  • In state 3: letter, period (change state to 4 if period found)
  • In states 4 thru 6: letter (increment state when in 4 or 5)

When we have gone all the way through the string, we return whether t==6 (t>5 is one char less) and o is 1.

2
votes

Whatever version of C++ MSVC2008 supports.

Here's my humble submission. Now I know why they told me never to do the things I did in here:

#define N return 0
#define I(x) &&*x!='.'&&*x!='_'
bool p(char*a) {
 if(!isalnum(a[0])I(a))N;
 char*p=a,*b=0,*c=0;
 for(int d=0,e=0;*p;p++){
  if(*p=='@'){d++;b=p;}
  else if(*p=='.'){if(d){e++;c=p;}}
  else if(!isalnum(*p)I(p))N;
  if (d>1||e>1)N;
 }
 if(b>c||b+1>=c||c+2>=p)N;
 return 1;
}
2
votes

Not the greatest solution no doubt, and pretty darn verbose, but it is valid.

Fixed (All test cases pass now)

    static bool ValidateEmail(string email)
{
    var numbers = "1234567890";
    var uppercase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    var lowercase = uppercase.ToLower();
    var arUppercase = uppercase.ToCharArray();
    var arLowercase = lowercase.ToCharArray();
    var arNumbers = numbers.ToCharArray();
    var atPieces = email.Split(new string[] { "@"}, StringSplitOptions.RemoveEmptyEntries);
    if (atPieces.Length != 2)
        return false;
    foreach (var c in atPieces[0])
    {
        if (!(arNumbers.Contains(c) || arLowercase.Contains(c) || arUppercase.Contains(c) || c == '.' || c == '_'))
            return false;
    }
    if(!atPieces[1].Contains("."))
        return false;
    var dotPieces = atPieces[1].Split('.');
    if (dotPieces.Length != 2)
        return false;
    foreach (var c in dotPieces[0])
    {
        if (!(arLowercase.Contains(c) || arUppercase.Contains(c)))
            return false;
    }
    var found = 0;
    foreach (var c in dotPieces[1])
    {
        if ((arLowercase.Contains(c) || arUppercase.Contains(c)))
            found++;
        else
            return false;
    }
    return found >= 2;
}
2
votes

C89 character set agnostic (262 characters)

#include <stdio.h>

/* the 'const ' qualifiers should be removed when */
/* counting characters: I don't like warnings :) */
/* also the 'int ' should not be counted. */

/* it needs only 2 spaces (after the returns), should be only 2 lines */
/* that's a total of 262 characters (1 newline, 2 spaces) */

/* code golf starts here */

#include<string.h>
int v(const char*e){
const char*s="0123456789._abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
if(e=strpbrk(e,s))
  if(e=strchr(e+1,'@'))
    if(!strchr(e+1,'@'))
      if(e=strpbrk(e+1,s+12))
        if(e=strchr(e+1,'.'))
          if(!strchr(e+1,'.'))
            if(strlen(e+1)>1)
              return 1;
return 0;
}

/* code golf ends here */

int main(void) {
  const char *t;
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "b@[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "test@%.org"; printf("%s ==> %d\n", t, v(t));
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "@w.org"; printf("%s ==> %d\n", t, v(t));
  t = "test@org"; printf("%s ==> %d\n", t, v(t));
  t = "s%[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "foo@a%.com"; printf("%s ==> %d\n", t, v(t));

  return 0;
}

Version 2

Still C89 character set agnostic, bugs hopefully corrected (303 chars; 284 without the #include)

#include<string.h>
#define Y strchr
#define X{while(Y
v(char*e){char*s="0123456789_.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
if(*e!='@')X(s,*e))e++;if(*e++=='@'&&!Y(e,'@')&&Y(e+1,'.'))X(s+12,*e))e++;if(*e++=='.'
&&!Y(e,'.')&&strlen(e)>1){while(*e&&Y(s+12,*e++));if(!*e)return 1;}}}return 0;}

That #define X is absolutely disgusting!

Test as for my first (buggy) version.

1
votes

VBA/VB6 - 484 chars

Explicit off
usage: VE("[email protected]")

Function V(S, C)
V = True
For I = 1 To Len(S)
 If InStr(C, Mid(S, I, 1)) = 0 Then
  V = False: Exit For
 End If
Next
End Function

Function VE(E)
VE = False
C1 = "abcdefghijklmnopqrstuvwxyzABCDEFGHILKLMNOPQRSTUVWXYZ"
C2 = "0123456789._"
P = Split(E, "@")
If UBound(P) <> 1 Then GoTo X
If Len(P(0)) < 1 Or Not V(P(0), C1 & C2) Then GoTo X
E = P(1): P = Split(E, ".")
If UBound(P) <> 1 Then GoTo X
If Len(P(0)) < 1 Or Not V(P(0), C1) Or Len(P(1)) < 2 Or Not V(P(1), C1) Then GoTo X
VE = True
X:
End Function
1
votes

Java: 257 chars (not including the 3 end of lines for readability ;-)).

boolean q(char[]s){int a=0,b=0,c=0,d=0,e=0,f=0,g,y=-99;for(int i:s)
d=(g="@._0123456789QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm".indexOf(i))<0?
y:g<1&&++e>0&(b<1|++a>1)?y:g==1&e>0&(c<1||f++>0)?y:++b>0&g>12?f>0?d+1:f<1&e>0&&++c>0?
d:d:d;return d>1;}

Passes all the tests (my older version was incorrect).

1
votes

Erlang 266 chars:

-module(cg_email).

-export([test/0]).

%%% golf code begin %%%
-define(E,when X>=$a,X=<$z;X>=$A,X=<$Z).
-define(I(Y,Z),Y([X|L])?E->Z(L);Y(_)->false).
-define(L(Y,Z),Y([X|L])?E;X>=$0,X=<$9;X=:=$.;X=:=$_->Z(L);Y(_)->false).
?L(e,m).
m([$@|L])->a(L);?L(m,m).
?I(a,i).
i([$.|L])->l(L);?I(i,i).
?I(l,c).
?I(c,g).
g([])->true;?I(g,g).
%%% golf code end %%%

test() ->
  true  = e("[email protected]"),
  false = e("b@[email protected]"),
  false = e("test@%.org"),
  false = e("[email protected]"),
  true  = e("[email protected]"),
  false = e("test@org"),
  false = e("s%[email protected]"),
  true  = e("[email protected]"),
  false = e("foo@a%.com"),
  ok.
1
votes

Ruby, 225 chars. This is my first Ruby program, so it's probably not very Ruby-like :-)

def v z;r=!a=b=c=d=e=f=0;z.chars{|x|case x when'@';r||=b<1||!e;e=!1 when'.'
e ?b+=1:(a+=1;f=e);r||=a>1||(c<1&&!e)when'0'..'9';b+=1;r|=!e when'A'..'Z','a'..'z'
e ?b+=1:f ?c+=1:d+=1;else r=1 if x!='_'||!e|!b+=1;end};!r&&d>1 end
1
votes

'Using no regex': PHP 47 Chars.

<?=filter_var($argv[1],FILTER_VALIDATE_EMAIL);
1
votes

Haskell (GHC 6.8.2), 165 161 144C Characters


Using pattern matching, elem, span and all:

a=['A'..'Z']++['a'..'z']
e=f.span(`elem`"._0123456789"++a)
f(_:_,'@':d)=g$span(`elem`a)d
f _=False
g(_:_,'.':t@(_:_:_))=all(`elem`a)t
g _=False

The above was tested with the following code:

main :: IO ()
main = print $ and [
  e "[email protected]",
  e "[email protected]",
  e "[email protected]",
  not $ e "b@[email protected]",
  not $ e "test@%.org",
  not $ e "[email protected]",
  not $ e "@w.org",
  not $ e "test@org",
  not $ e "s%[email protected]",
  not $ e "foo@a%.com"
  ]