0
votes

First of all I would like to let you know that I'm fairly new to Haskell and I'm trying to understand how parsers work in haskell . So I'm basically trying to parse this e-book from http://www.gutenberg.org/files/57071/57071-0.txt and analyze the text. Like output the number of english words, sentences and paragraphs and such. Here's my code:

{-# LANGUAGE OverloadedStrings #-}

import Control.Exception (catch, SomeException)
import System.Environment (getArgs)
import Data.Attoparsec.Text
import Data.Char
import Control.Applicative ((<*>), (*>), (<$>), (<|>), pure)


data Prose = Prose {
  word :: String
} deriving Show

prose :: Parser Prose
prose = do
  word <- many' $ satisfy isAlphaNum
  return $ Prose word

main :: IO()
main = do
  input <- readFile "small.txt"
  print $ parse prose input

This is my error message:

  • Couldn't match type ‘[Char]’ with ‘Data.Text.Internal.Text’;
    Expected type: Data.Text.Internal.Text;
    Actual type: String
  • In the second argument of ‘parse’, namely ‘input’ In the second argument of ‘($)’, namely ‘parse prose input’ In a stmt of a 'do' block: print $ parse prose input

I have used "OverloadedStrings" to try and fix this issue, but it doesnt seem to work. Also any guidance on examples or tutorials to get started with attoparsec would be greatly helpful!

1

1 Answers

3
votes

-XOverloadedStrings only changes the type of string literals from String to the more general IsString a => a (which can be unified with String, Text, ByteString and more). In your code, there's just one literal: the file name "small.txt".

But file names are always String anyway! Well, FilePath, but that's just a synonym for String. (Even the Data.Text.IO functions take filenames as such plain-old-list strings.) So the overloaded string literal actually makes no difference here at all.

But the parser does not process file names but file contents, so what you need to do is use IO routines that obtain this content as Text.

import qualified Data.Text.IO as Txt

main :: IO()
main = do
  input <- Txt.readFile "small.txt"
  print $ parse prose input