3
votes

I am using import Text.Parsec.Text and import Text.Parsec.Char to parse some data that includes integers. I am using the following code to parse integers.

p_int :: Parser Int
p_int = read <$> ((++) <$> option "" (string "-") <*> many1 digit)

I profiled my program and turns out that the above snippet takes >90% of the execution time. How do I optimize the above code?

I came across Text.ParserCombinators.Parsec.Number module that contains an int function to parse integers. However, its type is int :: Integral i => CharParser st i which is not compatible with the Text based parser I am using as evident by the error below.

   • Couldn't match type ‘[Char]’ with ‘Text’
      Expected type: Parser Int
        Actual type: Text.ParserCombinators.Parsec.Char.CharParser () Int

UPDATE I replaced Text.Parsec.Text with Text.Parsec.String and replaced my int parsing function with int from Text.ParserCombinators.Parsec.Number. This improved execution time by ~40%. But still the performance is worse that Python. Profiling shows that ~80% time is being consumed in int parsing. Does this mean Parsec is just slow?

COST CENTRE    MODULE                               SRC                                                       %time %alloc

sign           Text.ParserCombinators.Parsec.Number Text/ParserCombinators/Parsec/Number.hs:277:1-73           34.4   39.8
number         Text.ParserCombinators.Parsec.Number Text/ParserCombinators/Parsec/Number.hs:(321,1)-(323,18)   26.7   27.5
numberValue    Text.ParserCombinators.Parsec.Number Text/ParserCombinators/Parsec/Number.hs:(327,1)-(328,74)   10.2    6.7
zeroNumber     Text.ParserCombinators.Parsec.Number Text/ParserCombinators/Parsec/Number.hs:(300,1)-(301,56)    6.0   10.0
...

....

int                Text.ParserCombinators.Parsec.Number Text/ParserCombinators/Parsec/Number.hs:273:1-17         499          0    1.4    1.6    79.5   86.5
1
You should import import Text.Parsec.String instead of Text.Parsec.Text here.Willem Van Onsem
Using read isn't going to help you with efficiency. I would check stackoverflow.com/a/10726784/1248563cornuz
@WillemVanOnsem I have updated the question with my experience with Text.Parsec.String. Do you have any more advise please?Random dude
I think Parsec is just slow. I replaced it with Attoparsec and am seeing another 60% (80% over my custom int function from original question) speed-up without any optimization.Random dude

1 Answers

1
votes

I replaced Parsec with Attoparsec and without any optimization it is now 80% faster. Also the "total alloc" is down from over 3GB to 507MB.

The API is very similar between the two libraries so it was not at all difficult to migrate. I will try to optimize it further if possible and see how fast it can get.