Warning: long and complicated question incoming.
Some believe total functional programming is a valuable idea, and so is finding techniques for doing it. Minding that, how can I write a parser for a mutually recursive ADT without recursion and side effects? Here, I'm defining as "recursive" any term that isn't strongly normalizing.
What I have tried:
Mind this following mutually recursive ADT:
data Tree = Node Int [Tree]
tree = Node 10 [Node 20 [], Node 30 [], Node 40 []]
The value, tree
, could be serialized as:
tree_serial = [0,10,0,0,20,1,0,0,30,1,0,0,40,1,1] :: [Int]
Using ints for simplicity, here, 0
denotes the beginning of a Node
or a Cons
cell (depending on the state of the parser), 1 denotes Nil
and the remaining denotes data. We could easily write a parser for it using side effects:
var string = [0,10,0,0,20,1,0,0,30,1,0,0,40,1,1];
function parse(string){
function getInt(){
return string.shift();
};
function parseTree(){
var chr = getInt();
if (chr === 0)
return ["Node",getInt(),parseList()];
};
function parseList(){
var chr = getInt();
if (chr === 0)
return ["Cons",parseTree(),parseList()];
if (chr === 1)
return "Nil";
};
return parseTree();
};
console.log(JSON.stringify(parse(string)));
Here, getInt
is side-effective: it gets the next int from the string. We could easily and elegantly translate this to Haskell using Parsec or similar - but for better understanding, I skipped those and defined a stripped down parser type:
data Parser res = GetInt (Int -> Parser res) | Return res
runParser (GetInt fn) (c:cs) = runParser (fn c) cs
runParser (Return res) c = res
This works similar to a monadic parser, except more explicitly:
main = do
let parsePair = (GetInt (\a -> (GetInt (\b -> Return (a,b)))))
print $ runParser parsePair [1,2,3,4,5]
Using this, we could define our parser without side effects:
data Tree = Node Int [Tree] deriving Show
parseTree = treeParser Return where
treeParser = (\ cont ->
GetInt (\ _ ->
GetInt (\ tag ->
listParser (\ listParsingResult ->
(cont (Node tag listParsingResult))))))
listParser = (\ cont ->
GetInt (\ a ->
if a == 0
then treeParser (\x -> listParser (\y -> cont (x : y)))
else cont []))
main = do
let treeData = [0,10,0,0,20,1,0,0,30,1,0,0,40,1,1]
print $ runParser parseTree treeData
This outputs Node 10 [Node 20 [],Node 30 [],Node 40 []]
, as expected. Notice this still uses recursion, and I had to use cont
to pass the control between the two recursive functions. Now, there are 2 strategies to get rid of recursion I'm aware of:
1. Use folds.
2. Use church numbers for bounded recursion.
Using folds is obviously not viable here, since there is no structure to fold on (we are building it!). If we were parsing a list instead of a Tree, using church numbers would be perfect since they work exactly like the Y-combinator for bounded recursion - and, knowing the length of the list, we could just write toChurch listLength listParser init
. The problem with this case, though, is that there is mutual recursion going on, and it is not obvious which church number to use. We have many layers of lists and trees of unpredictable lengths. Indeed, if we use a big enough church number, it works without recursion, but at the cost of added work. This is one of the last examples of an actually useful program I couldn't replicate "correctly" without recursion. Can it be done?
For completeness, here is a JavaScript program which parses that tree without recursion, but using made-up church numbers:
function runParser(f){return function(str){
var a = f(str[0]);
return a(str.slice(1));
}};
function Const(a){return function(b){return a}};
function toChurch(n){return (function(f){return (function(a){
for (var i=0; i<n; ++i)
a = f(a);
return a;
}) }) };
function parser(get){
return toChurch(50)(function(rec){
return function (res){
return get(function(a){
return [get(function(b){
return toChurch(50)(function(recl){
return function(res){
return get(function(a){
return [
rec(function(a){
return recl(function(b){
return res(["Cons",a,b])
})
}),
res("Nil")][a];
});
};
})(0)(function(x){return res(["Node",b,x])});
})][a];
});
};
})(0)(Const);
};
var string = [0,200,0,0,300,0,0,400,1,0,0,500,1,0,0,500,1,1,0,0,600,0,0,700,1,0,0,800,1,0,0,900,1,1,1];
console.log(JSON.stringify(parser(runParser)(string)));
Notice that 50
constant inside the parser
function: it is completely arbitrary as a bound. I'm not sure there are "right" choices for those which would "fit perfectly" a specific parseable value.