matching two strings which differ in elements and spaces in perl

Question

I want to match two string which differ only in element and newlines

$string1 = "perl is <match>scripting language</match>";
$string2 = "perl<TAG> is<TAG> scr<TAG>ipt<TAG>inglanguage";

Note: spaces and <TAG> and newline can come anywhere in string2. space may or may not present in string2 for e.g. in above instance in $string2 spaces between words scripting language is missing. we have to ignore space,tags,newline while matching string1 against string2. <match> tag in string1 indicates the data to be matched against string2

output required :
whole content of string2 in addition with <match> tag.
perl<TAG> is<TAG> <match>scr<TAG>ipt<TAG>inglanguage</match>

Code i tried :

while($string =~ /<match>(.*?)<\/match>/gs)
{
   my $data_to_match = $1;
   $data_to_match = add_pat($data_to_match);

   $string2 =~ s{($data_to_match)}
   {
      "<match>$&<\/match>"
   }esi;
}

sub add_pat
{
   my ($data) = (@_);
   my @array = split//,$data;

   foreach my $each(@array)
   {
       $each = quotemeta $each;
       $each = '(?:(<TAG>|\s)+)?'.$each.'(?:(<TAG>|\s)+)?';
   }

   $data = join '',@array;
   return $data;
}

Problem : since space is missing in string2 it is not matching.i tried making space optional while appending pattern to each character. but making space optional. $string pattern goes on running.

In reality, i have large string to match. these space is causing problem..Please suggest

Could you remove all the tags and spaces from the two strings and then just check if they are equal? s/</?.*?>//g; s/\s+//g; — hmatt1
@Matt if can;t remove tags because we want it to be retain in final output — vivek
@vivekpro Int that case you can use Myforwiks answer and just copy the strings before. If the copies with all the stuff removed match, then the original strings fulfill your requirement. — DeVadder
@DeVadder in Myforwiks answer all the tags were removed..but tags cannot be removed we want to retain in final output — vivek

Mokky Miah Mokky Miah · Accepted Answer · 2014-03-07T08:38:14

Use regular expressions to remove all the characters that you wish to ignore from both of the strings. Then compare the remaining values of the two strings.

So you will end up both strings, for example:

'perlisscriptinglanguage' and 'perlisscriptinglanguage'

If you want you can also upper/lower case them to match too.

If they match then just return the original string 2.

matching two strings which differ in elements and spaces in perl

2 Answers