Answered by:
Flatten a tree into a list.

What is the best way to flatten a tree into a list with f#?
type 'a Tree =  Empty  Node of 'a Tree * 'a * 'a Tree //from Haskell version. let inline add x l = x :: l let inline flattenTree1 t= let rec help t = match t with  Empty > id  Node (l, n, r) > help l << add n << help r help t []
Take the f# version the same constant time as the haskell version?
Or is the following version better.
let flattenTree2 t = let rec loop tree acc = match tree with  Empty > acc  Node(l, n, r) > loop l (loop r (n :: acc)) loop t []
Is there a better suggestion?
Question
Answers

While all these algorithms are the same time complexitywise (linear time) since loop hits every node exactly once, they will have substantial constantfactor differences because of different patterns of using the stack and the heap.
Think of what happens in version #1. You pass a big tree, it hits the Node case of help, then it recurses on both branches immediately. This means that the whole tree will have to fit on the stack > you should not process trees bigger than a few megabytes in size. Moreover, a lot of closures get allocated, so the heap consumption is far from optimal. Note the same probably does not hold for Haskell, since it has a different evaluation order.
I actually like your version #2 code:
let flattenTree2 t = let rec loop tree acc = match tree with  Empty > acc  Node(l, n, r) > loop l (loop r (n :: acc)) loop t []
The only problem, if any, with it, is that one of the calls to loop is not tailrecursive. This means the computation happens on the stack. This is probably fine for 99% of the cases, but if you want to process trees deep enough to not fit onto the stack, this algorithm will overflow. Note how it is different from #1: in #1 the whole tree had to fit the stack, in #2 only the paths from root to leaf. A simple FP trick to completely eliminate the stack size limit is to use a continuationpassing style (CPS) to shift the accumulation to the heap:let flattenTree3 t = let rec loop tree k acc = match tree with  Empty > k acc  Node (l, n, r) > loop r (loop l k) (n :: acc) loop t id []
Sometimes you can improve heap use efficiency of this by designing a data structure to reify the continuations, instead of just using closures.The seq version proposed above is probably just as good. Since it is lazy, computations are delayed until the resulting sequence will be enumerated, and there is no danger of stack overflow. It might be significantly slower though then your #2 version for trees that do fit, because of all the extra layers it introduces.
WebSharper  F#toJavaScript and more at http://websharper.com Marked as answer by ffogd Thursday, August 04, 2011 6:36 PM
All replies

I think idiomatic F# would be to have a function which converts the Tree into a seq (which can then be saved into a list if desired):
module Tree = let rec toSeq tree = seq { match tree with  Empty > ()  Node (left, x, right) > yield! toSeq left yield x yield! toSeq right }


If all you want to do is flatten a tree "best" can be whatever algorithm works fastest, but in most situations your requirements go beyond that. Are you trying to serialize a tree? In that case a simple flattening may not save enough information to deserialize back into the original tree. Or a "flattening" may be intended as a first step to processing the nodes. In that case how you flatten is very important. There are many ways to cycle through a binary tree, but the three major ways are usually referred to as preorder, inorder, and postorder. (TAOCP 2.3.1)
And you don't even have to flatten the tree first to do whatever serial processing you want to do on it. Here is the first in an excellent series of blogposts on catamorphisms. http://lorgonblog.wordpress.com/2008/04/05/catamorphismspartone/ (8 articles in all, well worth studying.)
Here is some code I experimented with to illustrate these points (before I read about catamorphisms).
Notes:
1) This type of binary tree is specifically constructed to support unbalanced (missing) end nodes. In other words a node may only have one child, and it makes a difference whether it is a left or right child.
2) Because it supports missing end nodes serializing generic type 'a is tricky. (You can still see the remains of my lame attempt to overcome the absence of nulls in F#. There really is no alternative except the Option type.)
3) Examples of pre, in, and postorder flattening, as well as flattening saving additional information for reconstruction of the tree (e.g. member x.postOrderLevel).
type Tree<'a> =
 Tree of 'a * 'a Tree * Tree<'a> //note OCaml and .Net syntax, either works
 Node of 'a
 BinTreeNil
///indented tree, top node is left
override this.ToString() =
let rec loop indent string tree =
match tree with
 Tree(node, left, right) >
loop (indent + " ")
(loop (indent + " ") (string + "\n" + indent + node.ToString()) left)
right
 Node node >
string + "\n" + indent + node.ToString()
 BinTreeNil >
string + "\n" + indent + "None"
loop "" "" this
static member pseudoNull = "nULL" :> System.Object :?> 'a //note this only works when 'a is type string
member x.preOrder =
let rec loop tree =
seq {
match tree with
 Tree(x, left, right) >
yield x
yield! loop left
yield! loop right
 Node x > yield x
 BinTreeNil > ()
}
loop x
member x.preOrderLevel =
let rec loop tree level =
seq {
match tree with
 Tree(x, left, right) >
yield (x, level)
yield! loop left (level + 1)
yield! loop right (level + 1)
 Node x > yield (x, level)
 BinTreeNil > yield (Tree.pseudoNull, level)
}
loop x 0
member x.inOrder =
let rec loop tree =
seq {
match tree with
 Tree(x, left, right) >
yield! loop left
yield x
yield! loop right
 Node x > yield x
 BinTreeNil > ()
}
loop x
member x.inOrderLevel =
let rec loop tree level =
seq {
match tree with
 Tree(x, left, right) >
yield! loop left (level + 1)
yield (x, level)
yield! loop right (level + 1)
 Node x > yield (x, level)
 BinTreeNil > yield (Tree.pseudoNull, level)
}
loop x 0
member x.postOrder =
let rec loop tree =
seq {
match tree with
 Tree(x, left, right) >
yield! loop left
yield! loop right
yield x
 Node x > yield x
 BinTreeNil > ()
}
loop x
member x.postOrderLevel =
let rec loop tree level =
seq {
match tree with
 Tree(x, left, right) >
yield! loop left (level + 1)
yield! loop right (level + 1)
yield (x, level)
 Node x >
yield (x, level)
 _ > yield (Tree.pseudoNull, level)
}
loop x 0
Jack Fox facster.com



Both the functions you presented are linear time. In each of them you only visit each Node once and you only do a constant amount of work each time you visit a Node: namely a cons and, possibly (depending on optimisations), two sequential compositions, neither of which depend on the size of their input.
As I understand it, the only inefficiency problem you might be likely to encounter whilst trying to write a flatten function is if you write the "obvious" recursive implementation which doesn't use an accumulator, i.e.
let rec flattenTree3 t = match t with  Empty > []  Node (l, n, r) > flattenTree3 l @ [n] @ flattenTree3 r
because here, although you only ever visit each node once, you are doing an amount of work which is not constant: append (@) takes an amount of time which depends on the size of its input. Both of your functions avoid the call to append by using an accumulating parameter.Although your functions are going to be more or less the same in terms of efficiency, you might have noticed that they will not give you exactly the same result because, in the Node case of your recursive definition, the two functions differ as to when to add the label n to the list. The first one recursively adds all the labels from the right subtree followed by n followed by all the labels from the left subtree whereas the second version adds the n as the first thing that it does. Hence, you should see something like:
> flattenTree1 (Node (Node (Empty, 1, Empty), 2, Node (Empty, 3, Empty)));; val it : int list = [1; 2; 3] > flattenTree2 (Node (Node (Empty, 1, Empty), 2, Node (Empty, 3, Empty)));; val it : int list = [1; 3; 2]

While all these algorithms are the same time complexitywise (linear time) since loop hits every node exactly once, they will have substantial constantfactor differences because of different patterns of using the stack and the heap.
Think of what happens in version #1. You pass a big tree, it hits the Node case of help, then it recurses on both branches immediately. This means that the whole tree will have to fit on the stack > you should not process trees bigger than a few megabytes in size. Moreover, a lot of closures get allocated, so the heap consumption is far from optimal. Note the same probably does not hold for Haskell, since it has a different evaluation order.
I actually like your version #2 code:
let flattenTree2 t = let rec loop tree acc = match tree with  Empty > acc  Node(l, n, r) > loop l (loop r (n :: acc)) loop t []
The only problem, if any, with it, is that one of the calls to loop is not tailrecursive. This means the computation happens on the stack. This is probably fine for 99% of the cases, but if you want to process trees deep enough to not fit onto the stack, this algorithm will overflow. Note how it is different from #1: in #1 the whole tree had to fit the stack, in #2 only the paths from root to leaf. A simple FP trick to completely eliminate the stack size limit is to use a continuationpassing style (CPS) to shift the accumulation to the heap:let flattenTree3 t = let rec loop tree k acc = match tree with  Empty > k acc  Node (l, n, r) > loop r (loop l k) (n :: acc) loop t id []
Sometimes you can improve heap use efficiency of this by designing a data structure to reify the continuations, instead of just using closures.The seq version proposed above is probably just as good. Since it is lazy, computations are delayed until the resulting sequence will be enumerated, and there is no danger of stack overflow. It might be significantly slower though then your #2 version for trees that do fit, because of all the extra layers it introduces.
WebSharper  F#toJavaScript and more at http://websharper.com Marked as answer by ffogd Thursday, August 04, 2011 6:36 PM

Thanks for such a detailed explanation. In addition I take a function with an explicit stack.
Here is my test run.
open System type KDTreeNode<'a> =  Empty  Node of KDTreeNode<'a> * 'a * KDTreeNode<'a> let inline test f t= printfn "Test Start" let sw = new System.Diagnostics.Stopwatch() sw.Start() let res=List.length (f t) sw.Stop() let time = sw.ElapsedMilliseconds printfn "Time Duration : %A" (time, res) time let inline insert l = let rec loop l acc = match l,acc with  [],_ > acc  data::xs, Empty > loop xs (Node (Empty, data, Empty))  data::xs, Node (left, ndata, right) > if data % 2 > 0 then loop xs (Node (Node(left,ndata,Empty), data, right )) else loop xs (Node (left, data, Node(Empty,ndata,right))) loop l Empty let inline flatten4 tree = let s = System.Collections.Generic.Stack[tree] let rec loop (stack:Collections.Generic.Stack<KDTreeNode<'a>>) acc = match stack.Count>0 with  false > acc  true > match stack.Pop() with  Empty > loop stack acc  Node(left, a, right) > stack.Push left stack.Push right loop stack (a::acc) loop s [] let inline flatten3 t = let rec loop tree k acc = match tree with  Empty > k acc  Node (l, n, r) > loop r (loop l k) (n :: acc) loop t id [] let inline flatten2 t = let rec loop tree acc = match tree with  Empty > acc  Node(l, n, r) > loop l (loop r (n :: acc)) loop t [] let testN f t = printfn "Total Time: %A" (List.fold (fun acc _> acc + (test f t)) (int64 0) [0..10]) // TEST with 50000 let tree =insert [0..50000] testN flatten2 tree testN flatten3 tree testN flatten4 tree // TEST with 100000 let tree1 =insert [0..100000] //flatten2 stack overflow testN flatten3 tree1 testN flatten4 tree1;; // TEST with 50000 Test Start flatten2 Time Duration : (36L, 50001) Test Start Time Duration : (4L, 50001) Test Start Time Duration : (4L, 50001) Test Start Time Duration : (5L, 50001) Test Start Time Duration : (5L, 50001) Test Start Time Duration : (35L, 50001) Test Start Time Duration : (11L, 50001) Test Start Time Duration : (4L, 50001) Test Start Time Duration : (4L, 50001) Test Start Time Duration : (5L, 50001) Test Start Time Duration : (5L, 50001) Total Time: 118L Test Start flatten3 Time Duration : (51L, 50001) Test Start Time Duration : (32L, 50001) Test Start Time Duration : (36L, 50001) Test Start Time Duration : (39L, 50001) Test Start Time Duration : (30L, 50001) Test Start Time Duration : (35L, 50001) Test Start Time Duration : (30L, 50001) Test Start Time Duration : (33L, 50001) Test Start Time Duration : (39L, 50001) Test Start Time Duration : (30L, 50001) Test Start Time Duration : (39L, 50001) Total Time: 394L Test Start flatten4 Time Duration : (11L, 50001) Test Start Time Duration : (7L, 50001) Test Start Time Duration : (6L, 50001) Test Start Time Duration : (7L, 50001) Test Start Time Duration : (10L, 50001) Test Start Time Duration : (6L, 50001) Test Start Time Duration : (6L, 50001) Test Start Time Duration : (6L, 50001) Test Start Time Duration : (6L, 50001) Test Start Time Duration : (7L, 50001) Test Start Time Duration : (6L, 50001) Total Time: 78L // TEST with 100000 Test Start flatten3 Time Duration : (123L, 100001) Test Start Time Duration : (74L, 100001) Test Start Time Duration : (93L, 100001) Test Start Time Duration : (80L, 100001) Test Start Time Duration : (74L, 100001) Test Start Time Duration : (70L, 100001) Test Start Time Duration : (72L, 100001) Test Start Time Duration : (89L, 100001) Test Start Time Duration : (75L, 100001) Test Start Time Duration : (73L, 100001) Test Start Time Duration : (78L, 100001) Total Time: 901L Test Start flatten4 Time Duration : (24L, 100001) Test Start Time Duration : (13L, 100001) Test Start Time Duration : (13L, 100001) Test Start Time Duration : (14L, 100001) Test Start Time Duration : (18L, 100001) Test Start Time Duration : (14L, 100001) Test Start Time Duration : (14L, 100001) Test Start Time Duration : (14L, 100001) Test Start Time Duration : (18L, 100001) Test Start Time Duration : (17L, 100001) Test Start Time Duration : (13L, 100001) Total Time: 172L