1
votes
{-# LANGUAGE ScopedTypeVariables,BangPatterns #-}

import qualified Data.Attoparsec.Internal as I
import qualified Data.Attoparsec.Internal.Types as T
import qualified Data.Vector.Unboxed as UVec
import qualified Data.Vector.Unboxed.Mutable as UMVec
import qualified Data.Vector as Vec
import qualified Data.Vector.Mutable as MVec
import qualified Data.Text as Text
import qualified System.IO.Unsafe as Unsafe

import Control.Monad.ST
import Control.Monad.Primitive

type Parser = T.Parser

manyCPSVec :: Parser Text.Text Char -> Parser Text.Text (Vec.Vector Char)
manyCPSVec parser = T.Parser $ \t pos more lose_fin win_fin ->
      let arr = Unsafe.unsafePerformIO (MVec.new 1024) in
      loop 0 arr t pos more lose_fin win_fin where
          loop i (arr :: MVec.MVector RealWorld Char) t pos more lose_fin win_fin =
              T.runParser parser t pos more lose win where
                  win t !pos more (a :: Char) =
                    Unsafe.unsafePerformIO (MVec.write arr i a) -- Here is the problem
                    loop (i+1) arr t pos more lose_fin win_fin
                  lose t pos more _ _ =
                      --x <- Vec.freeze arr
                      win_fin t pos more (Vec.empty)

main = print "Hello"

I am trying to put in some Vector functionality in Attoparsec for efficiency, but I've run into a wall.

If Attoparsec was not written using CPS, I could have used a functional unfoldr, but that is not an option here. The problem is that all calls have to be in the tail position here and no function returns in the standard sense of the thing therefore arrays have to be passed along as accumulators.

I tried to do this using the ST monad, but when that did not work I tried the above, but even so it fails to work. Haskell's type system is really killing me here. Is it in anyway possible to edit mutable arrays when programming with CPS?

If it is possible to do this within the ST monad I would doubly appreciate this.

Posts telling me to use data structures other than arrays though, will be downvoted.

1
Just wondering, does this turn out to be any faster than collecting a list and using Vector.fromList? - Michael
I think you should be able to find a way to do this in ST. The challenge will be dealing efficiently with partial results, but I suspect there is no entirely efficient way to handle those. In particular, remember that a partial result may be resumed multiple times with different inputs, so too much unsafePerformIO could break your code. - dfeuer
@Michael Yeah, quite a bit. For parsing 10M integers, it makes a difference what data structure you use. Using the completed parser with boxed vectors it deals with 10M ints in 7.6s. With the unboxed parser it does it in 4s. Just using an extended functional unfoldr with a boxed Vector does it in 2s. ResizeArray in F# does it in 1.2s. The persistent vector does it in 9.8. Apart from the F# solution, they all blow the heap on 100M. Lists do that long before 10M. - Marko Grdinić
@dfeuer To be honest, I've completely lost interest in this now. Literally nothing I've tried in Haskell lets me parse 100M ints without blowing the heap and I've been trying various approaches for 10 days now. I see it as a failure of the language. I ended up figuring out CPS and monads, but not how to do a task that would take me 10m in an imperative language. - Marko Grdinić
If you are really just parsing whitespace separated Ints, I think a parsing library might be too fancy. Data.ByteString.Char8.readInt is very fast. - Michael

1 Answers

0
votes

A few minutes after I made this post, it occurred to me that I made a syntactic error.

manyCPSVec :: Parser Text.Text Char -> Parser Text.Text (Vec.Vector Char)
manyCPSVec parser = T.Parser $ \t pos more lose_fin win_fin ->
      let arr = Unsafe.unsafePerformIO (MVec.new 1024) in
      loop 0 arr t pos more lose_fin win_fin where
          loop i (arr :: MVec.MVector RealWorld Char) t pos more lose_fin win_fin =
              T.runParser parser t pos more lose win where
                  win t !pos more (a :: Char) =
                    Unsafe.unsafePerformIO $ do
                      MVec.write arr i a
                      return $ loop (i+1) arr t pos more lose_fin win_fin
                  lose t pos more _ _ =
                    Unsafe.unsafePerformIO $ do
                      x <- Vec.freeze arr
                      return $ win_fin t pos more x

The above type checks ok. I forgot that I can't sequence statements outside the do blocks in Haskell.