61
votes

A friend asked me this and I was stumped: Is there a way to craft a regular expression that matches a sequence of the same character? E.g., match on 'aaa', 'bbb', but not 'abc'?

m|\w{2,3}| 

Wouldn't do the trick as it would match 'abc'.

m|a{2,3}| 

Wouldn't do the trick as it wouldn't match 'bbb', 'ccc', etc.

7

7 Answers

115
votes

Sure thing! Grouping and references are your friends:

(.)\1+

Will match 2 or more occurences of the same character. For word constituent characters only, use \w instead of ., i.e.:

(\w)\1+
10
votes

Note that in Perl 5.10 we have alternative notations for backreferences as well.

foreach (qw(aaa bbb abc)) {
  say;
  say ' original' if /(\w)\1+/;
  say ' new way'  if /(\w)\g{1}+/;
  say ' relative' if /(\w)\g{-1}+/;
  say ' named'    if /(?'char'\w)\g{char}+/;
  say ' named'    if /(?<char>\w)\k<char>+/;
}
4
votes

This will match more than \w would, like @@@:

/(.)\1+/
2
votes

Answering my own question, but got it:

m|(\w)\1+|
1
votes

This is what backreferences are for.

m/(\w)\1\1/

will do the trick.

1
votes

This is also possible using pure regular expressions (i.e. those that describe regular languages -- not Perl regexps). Unfortunately, it means a regexp whose length is proportional to the size of the alphabet, e.g.:

(a* + b* + ... + z*)

Where a...z are the symbols in the finite alphabet.

So Perl regexps, although a superset of pure regular expressions, definitely have their advantages even when you just want to use them for pure regular expressions!

0
votes

If you are using Java, and find duplicate chars in given string here is the code,

public class Test {
public static void main(String args[]) {
    String s = "abbc";
    if (s.matches(".*([a-zA-Z])\\1+.*")) {
        System.out.println("Duplicate found!");
    } else {
        System.out.println("Duplicate not found!");
    }
}

}