10
votes

I want to replace one string with the other in Perl; both are of the same length. I want to replace all occurrences of the string (case insensitive), but I want that the case of the letter will be preserved. So if the first letter was upper case, the first letter after the replacement will be upper case also.

For example, if I want to replace "foo" with "bar", so I want that

foo ==> bar
Foo ==> Bar
FOO ==> BAR

Is there a simple way to do this in Perl?

10
Personally, I think using two regexes is perfectly fine (and possibly more readable than the alternatives). For example, the following is a common way to strip leading and trailing whitespace from a string: s/^\s+//; s/\s+$//;ThisSuitIsBlackNot

10 Answers

14
votes

This might be what you are after:

How do I substitute case insensitively on the LHS while preserving case on the RHS?

This is copied almost directly from the above link:

sub preserve_case($$) {
    my ($old, $new) = @_;
    my $mask = uc $old ^ $old;
    uc $new | $mask .
    substr($mask, -1) x (length($new) - length($old))
}

my $string;

$string = "this is a Foo case";
$string =~ s/(Foo)/preserve_case($1, "bar")/egi;
print "$string\n";

# this is a Bar case

$string = "this is a foo case";
$string =~ s/(Foo)/preserve_case($1, "bar")/egi;
print "$string\n";

# this is a bar case

$string = "this is a FOO case";
$string =~ s/(Foo)/preserve_case($1, "bar")/egi;
print "$string\n";

# this is a BAR case
13
votes

perldoc perlfaq6 provides some insights:

How do I substitute case-insensitively on the LHS while preserving case on the RHS?

Here's a lovely Perlish solution by Larry Rosler. It exploits properties of bitwise xor on ASCII strings.

$_= "this is a TEsT case";
$old = 'test';
$new = 'success';
s{(\Q$old\E)}
    { uc $new | (uc $1 ^ $1) .
            (uc(substr $1, -1) ^ substr $1, -1) x
            (length($new) - length $1)
    }egi;
print;    # 'this is a SUcCESS case'

And here it is as a subroutine, modeled after the above:

sub preserve_case {
        my ($old, $new) = @_;
        my $mask = uc $old ^ $old;
        uc $new | $mask .
            substr($mask, -1) x (length($new) - length($old))
    }

$string = "this is a TEsT case";
$string =~ s/(test)/preserve_case($1, "success")/egi;
print "$string\n";

This prints:

this is a SUcCESS case

So you could use the preserve_case() subroutine like so. Just don't expect Unicode miracles :)

s[\b(abc)\b][preserve_case($1,'xyz')]ei ;
6
votes
$text =~ s/\b(?:(Abc)|abc)\b/ $1 ? 'Xyz' : 'xyz' /eg;

If the actual list is longer, you can use a lookup table.

my %translations = (
   'Abc' => 'Xyz',  'abc' => 'xyz',
   'Def' => 'Ghi',  'def' => 'ghi',
   'Jkl' => 'Mno',  'jkl' => 'mno',
);

my $alt_pat = join '|', map quotemeta, keys(%translations);

$text =~ s/\b($alt_pat)\b/$translations{$1}/g;

But that still leaves some duplication that could be removed by deriving the lowercase versions.

my %translations = (
   'Abc' => 'Xyz',
   'Def' => 'Ghi',
   'Jkl' => 'Mno',
);

%translations = ( ( map lc, %translations ), %translations );

my $alt_pat = join '|', map quotemeta, keys(%translations);

$text =~ s/\b($alt_pat)\b/$translations{$1}/g;
5
votes

Here's a solution that factors out the idea of "alter one string to match the capitalization of another string" into a function, and calls that function to build the replacement.

sub matchcap
{
  my ($s,$r) = @_;
  return $s eq ucfirst($s) ? ucfirst($r) : lcfirst($r);
}

s/\b(Abc|abc)\b/matchcap($1,'xyz')/ge;
3
votes

A bit of a hack, using the experimental code extended regular expression:

$text =~ s/\b([Aa])(?{ $n=chr(ord($^N)+23) })bc/${n}yz/

First, match the letter A with ([Aa]). The following (?{...}) contains arbitrary code, with $^N containing the text of the most recently captured subgroup. The 23 is the difference in ASCII codes between A and X (for upper- and lowercase), so $n contains the letter X with the same case as the corresponding A.

(This should not be taken as an endorsement to write code like this, but as an interesting example of this experimental regular expression.)

3
votes

Here's a "semi-perlish" solution that should work for arbitrary regexps and Unicode data:

sub adjust_case {
    my ($text, $case) = @_;
    $case .= substr($case, -1) x (length($text) - length($case));
    $_ = [ split // ] for $text, $case;
    return join "", map {
        $case->[$_] =~ /\p{Upper}/ ? uc $text->[$_] :
        $case->[$_] =~ /\p{Lower}/ ? lc $text->[$_] : $text->[$_]
    } 0 .. $#$text;
}

my $regexp  = qr/\b(abc\w*)\b/i;
my $replace = "Xyzzy";

s/$regexp/adjust_case $replace, ${^MATCH}/egp;
2
votes

You could do this:

my %trans = (
    'Abc' => Xyz, 
    'abc' => xyz,
);
$text =~s/\b(Abc|abc)\b/$trans{$1}/ge;
1
votes

You know each string is the same length, so basically, you can:

index = Pos(string, oldString)
for i = index to index + strlen(oldString)
  if (oldString[i] >= 'a') && (oldString[i] <= 'z'')
    string[i] = ToLower(newString[i])
  else
    string[i] = ToUpper(newString[i])0x20
0
votes

Here's a neat trick that uses non-destructive transliteration (available in Perl 5.14) within the result of the substitution.

use 5.014;
$string =~ s/\b(f)(o)(o)\b/ ($1 =~ tr{fF}{bB}r) . ($2 =~ tr{oO}{aA}r) . ($3 =~ tr{oO}{rR}r) /egi;

You can even shorten it if consecutive groups of letters have same replacements, e.g.

# foo ==> see, FoO ==> SeE, etc.
$string =~ s/\b(foo)\b/ $1 =~ tr{fFoO}{sSeE}r /egi;
0
votes

Check character by character. If a character's ASCII value falls in uppercase ASCII values, replace with uppercase else lowercase.