Options for parallelizing functions on F# tree

Question

In my project, I have a data structure represented as follows:

type 'a Tree =
| Leaf of 'a
| Node of 'a Tree array

Due to the cost of traversing through large trees, I have to parallelize some following functions on this data structure:

map
fold
exists (exist a node satisfying a predicate)
reduce / transform (optional)

Because of nature of the problem I'm working on, number of branches at each node is varied and workloads at the leaf level are quite small. The first question is what options should I consider for parallel execution on tree. I'm trying to use functions from Array.Parallel module on every node, however, because overheads of parallelism is too big, the parallel version is even slower than the sequential version. I may change array representation to List or PSeq if it is necessary.

The second question is how to control degree of parallelism on those functions. I'm thinking about controlling by depth of tree, number of branches at each node, workload complexity at the leaf level and number of leaves on tree, however, combining them together seems to be complex and unpredictable.

Daniel Daniel · Accepted Answer · 2011-03-16T04:24:56

How about separating traversal from any other processing? Perhaps create a work queue (MailboxProcessor is a good starting point) and, as the tree is traversed, enqueue additional work for background processing. It doesn't solve the parallel traversal problem (which seems tricky to get right for all cases) but with additional processing relegated to the background, it should go pretty quickly. You can experiment with the number of background workers until you find a good degree of parallelism. This all assumes the amount of work to be done for each node is non-trivial.

EDIT

Here's some code. I'm sure it can be improved. I had to hammer it out pretty quickly. But this shows the basic concept. It only has one "background worker," i.e., the MailboxProcessor. I'll leave updating it to use multiple workers to the imagination.

type Msg<'a, 'b> =
    | Work of 'a
    | Done of 'b

type MapTransformer(f) =
    let results = ResizeArray()
    let m = MailboxProcessor.Start(fun payload ->
        let rec loop() =
            async {
                let! msg = payload.Receive()
                match msg with
                | Work work -> 
                    results.Add(f work)
                    return! loop()                
                | Done (channel : AsyncReplyChannel<_>) -> 
                    channel.Reply(results :> seq<_>)
            }
        loop())
    member this.Enqueue(item) = m.Post(Work item)
    member this.Results = m.PostAndReply(fun c -> Done c)

let uberMap tree =
    let m = MapTransformer(fun x -> x + 1)
    tree |> List.iter (fun x -> m.Enqueue(x))
    m.Results

uberMap [1; 2; 3]
//outputs [2; 3; 4]

Options for parallelizing functions on F# tree

2 Answers

EDIT