0
votes

After reading a recent Perl question on here about checking if a value exists in an array left me thinking about how to do this. I see most people recommending the grep option in the form

if (!grep { $input_day eq $_ } @days ) {
    say "Grep Invalid Day";
}

However when I read this question my first through jumped to the smart match operator

unless ( $input_day ~~ @days ) {
    say "Smart Invalid Day";
}

So it got me wondering if there is any benefit of using grep over smart-match or vice versa. I know smart-match is only available in later versions of Perl so is not something that can be recommend for people with a Perl version before 5.10.1.

I have never really bench-marked Perl code before so the below code was written from an example online. I have tried running the smart match example 2 million times and the grep example 2 million times and recording the timing.

use strict;
use warnings;
use v5.16.2;
use Benchmark;

my $input_day = shift;
my @days = qw /mon tue wed thu fri sat sun/;

my $smart_test_start = new Benchmark();
for(my $x=0; $x<10000000; $x++){
        unless ( $input_day ~~ @days ) {
                #here we would execute some code
        }
}
my $smart_test_end = new Benchmark();

my $grep_test_start = new Benchmark();
for(my $y=0; $y<10000000; $y++){
        if (!grep { $input_day eq $_ } @days ) {
                #here we would execute some code
        }
}
my $grep_test_end = new Benchmark();

my $smart_diff = timediff($smart_test_end, $smart_test_start);
my $grep_diff = timediff($grep_test_end, $grep_test_start);

say "SMART: ", timestr($smart_diff,'all');
say "GREP: ", timestr($grep_diff,'all');

I used a few different inputs.

Input "mon"

SMART:  3 wallclock secs ( 2.75 usr  0.00 sys +  0.00 cusr  0.00 csys =  2.75 CPU)
GREP: 12 wallclock secs (12.02 usr  0.01 sys +  0.00 cusr  0.00 csys = 12.03 CPU)

Input "thu"

SMART:  6 wallclock secs ( 5.67 usr  0.00 sys +  0.00 cusr  0.00 csys =  5.67 CPU)
GREP: 11 wallclock secs (11.46 usr  0.01 sys +  0.00 cusr  0.00 csys = 11.47 CPU)

Input "sun"

SMART:  8 wallclock secs ( 8.87 usr  0.01 sys +  0.00 cusr  0.00 csys =  8.88 CPU)
GREP: 12 wallclock secs (11.62 usr  0.00 sys +  0.00 cusr  0.00 csys = 11.62 CPU)

Input "non"

SMART:  9 wallclock secs ( 8.46 usr  0.00 sys +  0.00 cusr  0.00 csys =  8.46 CPU)
GREP: 11 wallclock secs (11.58 usr  0.13 sys +  0.00 cusr  0.00 csys = 11.71 CPU)

In all the cases the smart match operator seems to perform better than the grep. Looking at the results, i assume in the early use cases this is because the smart-match will stop as soon as it finds a match where as the grep will continue checking the rest of the array after matching the first occurrence.

I then see other people recommending to use certain modules to find the first instance etc.

Is there some reason people don't recommend the smart-match operator? Is there some limitation or unreliability in smart-match?

3
You should also consider any from List::MoreUtilsBorodin

3 Answers

2
votes

Do not, repeat DO NOT use the smartmatch operator in production code. According to perldelta smartmatch has been marked experimental:

Smart match, added in v5.10.0 and significantly revised in v5.10.1, has been a regular point of complaint. Although there are a number of ways in which it is useful, it has also proven problematic and confusing for both users and implementors of Perl. There have been a number of proposals on how to best address the problem. It is clear that smartmatch is almost certainly either going to change or go away in the future. Relying on its current behavior is not recommended.

Warnings will now be issued when the parser sees ~~, given, or when. To disable these warnings, you can add this line to the appropriate scope:

no if $] >= 5.018, "experimental::smartmatch";

Consider, though, replacing the use of these features, as they may change behavior again before becoming stable.

This means that code depending on this feature cannot be considered stable until these issues have been resolved.

2
votes

The proper solution to this uses a hash instead of an array

my %days = map { $_ => 1 } @days

then you can write

unless ($days{$input_day}) {
  say "Hash Invalid Day";
}

and the performance will far exceed any other solution.

(I hope it's obvious, but you should set up the hash only once and keep using it thereafter for all tests.)

0
votes

Would like to add my experience with this as I did a fair bit of testing. I have always used Smart Match and recently got tired of the warnings it will generate.

I have a text file of 100 million, 10 character strings.

The Perl script converts STDIN into an array and does 3 common ways of finding if a string exists inside the array. I experimented with using a hash map as suggested above however the hash-map takes 3X time to generate as compared to an array. If you are doing extensive testing for existing values this trade-off may be ok at some point because the exists check on a hash is basically instant. Also it will depend on your data source.

In the future I plan on using mostly List::Util (any) because its future proof, is a core module and the performance is solid.

#!/usr/bin/perl
use List::Util qw(any);
my @arr = qw(a b c d e);
if ( any { $_ eq 'd' } @arr ) { 
    print "Found.\n";
}

Methods:

List::Util (any): if ( any { $_ eq $a } @arr ) { do something. }
Perl Smartmatch: if ( $b ~~ @arr ) { do something. }
Grep: if ( grep { $c eq $_ } @arr ) { do something. }

I searched for values i know exist in the 1,10,100,1000,10000,100000,1000000,10000000,100000000,100000000,100000000 positions. Timing was done with the Time::HiRes module.

What I found is that if most your values are in the beginning of the array smartmatch will out perform List::Utils method. However If most your values are in the middle or end or do not exist in the array List::Util will out perform. It seems grep does an exhaustive search regardless on it finding a value or not.

More Output details:

Smart Match total: 5.939 seconds.
List::Util::any: 7.332 seconds.
Grep total: 39.553 seconds.

Array Generation Time: 30.315 seconds. Searching 100000000 arr elements.

any Searching eavTa2eWr1 any Found - eavTa2eWr1. Time: 0.540 seconds. any Searching mhEusMj5E7 any Found - mhEusMj5E7. Time: 0.358 seconds. any Searching WGwHfJICK6 any Found - WGwHfJICK6. Time: 0.364 seconds. any Searching I48fNDYNKF any Found - I48fNDYNKF. Time: 0.359 seconds. any Searching q3YVBTmX9J any Found - q3YVBTmX9J. Time: 0.357 seconds. any Searching pw0J5vRCnW any Found - pw0J5vRCnW. Time: 0.358 seconds. any Searching GNJP5flX5z any Found - GNJP5flX5z. Time: 0.392 seconds. any Searching 3Mh0x0R3OC any Found - 3Mh0x0R3OC. Time: 0.649 seconds. any Searching H5yxSA7eDx any Found - H5yxSA7eDx. Time: 3.473 seconds. List::Util::any: 6.850 seconds.

###############################################################

SM Searching eavTa2eWr1 SM Found eavTa2eWr1. Time: 0.000 seconds. SM Searching mhEusMj5E7 SM Found mhEusMj5E7. Time: 0.000 seconds. SM Searching WGwHfJICK6 SM Found WGwHfJICK6. Time: 0.000 seconds. SM Searching I48fNDYNKF SM Found I48fNDYNKF. Time: 0.000 seconds. SM Searching q3YVBTmX9J SM Found q3YVBTmX9J. Time: 0.001 seconds. SM Searching pw0J5vRCnW SM Found pw0J5vRCnW. Time: 0.005 seconds. SM Searching GNJP5flX5z SM Found GNJP5flX5z. Time: 0.054 seconds. SM Searching 3Mh0x0R3OC SM Found 3Mh0x0R3OC. Time: 0.519 seconds. SM Searching H5yxSA7eDx SM Found H5yxSA7eDx. Time: 5.083 seconds. Smart Match total: 5.662 seconds.

############################################################### Grep Searching eavTa2eWr1 Grep Found eavTa2eWr1. Time: 4.648 seconds. Grep Searching mhEusMj5E7 Grep Found mhEusMj5E7. Time: 4.546 seconds. Grep Searching WGwHfJICK6 Grep Found WGwHfJICK6. Time: 4.295 seconds. Grep Searching I48fNDYNKF Grep Found I48fNDYNKF. Time: 4.262 seconds. Grep Searching q3YVBTmX9J Grep Found q3YVBTmX9J. Time: 4.282 seconds. Grep Searching pw0J5vRCnW Grep Found pw0J5vRCnW. Time: 4.462 seconds. Grep Searching GNJP5flX5z Grep Found GNJP5flX5z. Time: 4.420 seconds. Grep Searching 3Mh0x0R3OC Grep Found 3Mh0x0R3OC. Time: 4.185 seconds. Grep Searching H5yxSA7eDx Grep Found H5yxSA7eDx. Time: 4.112 seconds. Grep total: 39.214 seconds. Done.

Checking for values that dont exist.

List::Util::any: 28.980 seconds.
Grep total: 34.790 seconds.
Smart Match total: 42.913 seconds.

Array Generation Time: 30.909 seconds. Searching 100000000 arr elements.

any Searching eavTa2eWr1l Time: 3.264 seconds. any Searching mhEusMj5E7l Time: 3.404 seconds. any Searching WGwHfJICK6l Time: 3.291 seconds. any Searching I48fNDYNKFl Time: 3.240 seconds. any Searching q3YVBTmX9Jl Time: 3.083 seconds. any Searching pw0J5vRCnWl Time: 3.247 seconds. any Searching GNJP5flX5zl Time: 3.180 seconds. any Searching 3Mh0x0R3OCl Time: 3.028 seconds. any Searching H5yxSA7eDxl Time: 3.243 seconds. List::Util::any: 28.980 seconds.

###############################################################

SM Searching eavTa2eWr1l Time: 4.620 seconds. SM Searching mhEusMj5E7l Time: 4.783 seconds. SM Searching WGwHfJICK6l Time: 4.899 seconds. SM Searching I48fNDYNKFl Time: 4.902 seconds. SM Searching q3YVBTmX9Jl Time: 4.863 seconds. SM Searching pw0J5vRCnWl Time: 4.646 seconds. SM Searching GNJP5flX5zl Time: 4.751 seconds. SM Searching 3Mh0x0R3OCl Time: 4.666 seconds. SM Searching H5yxSA7eDxl Time: 4.782 seconds. Smart Match total: 42.913 seconds.

###############################################################

Grep Searching eavTa2eWr1l Time: 4.034 seconds. Grep Searching mhEusMj5E7l Time: 3.849 seconds. Grep Searching WGwHfJICK6l Time: 3.837 seconds. Grep Searching I48fNDYNKFl Time: 3.822 seconds. Grep Searching q3YVBTmX9Jl Time: 3.923 seconds. Grep Searching pw0J5vRCnWl Time: 3.825 seconds. Grep Searching GNJP5flX5zl Time: 3.994 seconds. Grep Searching 3Mh0x0R3OCl Time: 3.846 seconds. Grep Searching H5yxSA7eDxl Time: 4.174 seconds. Grep total: 35.303 seconds. Done.