none
Flatten a tree into a list.

    Question

  • What is the best way to flatten a tree into a list with f#?

    type 'a Tree =
        | Empty
        | Node of 'a Tree * 'a * 'a Tree
     
    //from Haskell version.
    let inline add x l = x :: l
    
    let inline flattenTree1 t=
        let rec help t =
          match t with
          | Empty -> id
          | Node (l, n, r) ->
            help l << add n << help r
        help t []

    Take the f# version the same constant time as the haskell version? 

    Or is the following version better.

     

    let flattenTree2 t =
      let rec loop tree acc =
        match tree with
        | Empty -> acc
        | Node(l, n, r) -> loop l (loop r (n :: acc)) 
      loop t []
     Is there a better suggestion?

     

     

    Friday, July 29, 2011 8:54 PM

Answers

  • While all these algorithms are the same time complexity-wise (linear time) since loop hits every node exactly once, they will have substantial constant-factor differences because of different patterns of using the stack and the heap.

    Think of what happens in version #1. You pass a big tree, it hits the Node case of help, then it recurses on both branches immediately. This means that the whole tree will have to fit on the stack -> you should not process trees bigger than a few megabytes in size. Moreover, a lot of closures get allocated, so the heap consumption is far from optimal. Note the same probably does not hold for Haskell, since it has a different evaluation order. 

    I actually like your version #2 code:

    let flattenTree2 t =
     let rec loop tree acc =
      match tree with
      | Empty -> acc
      | Node(l, n, r) -> loop l (loop r (n :: acc)) 
     loop t []
    
    The only problem, if any, with it, is that one of the calls to loop is not tail-recursive. This means the computation happens on the stack. This is probably fine for 99% of the cases, but if you want to process trees deep enough to not fit onto the stack, this algorithm will overflow. Note how it is different from #1: in #1 the whole tree had to fit the stack, in #2 only the paths from root to leaf. A simple FP trick to completely eliminate the stack size limit is to use a continuation-passing style (CPS) to shift the accumulation to the heap:
      let flattenTree3 t =
       let rec loop tree k acc =
        match tree with
        | Empty     -> k acc
        | Node (l, n, r) -> loop r (loop l k) (n :: acc)
       loop t id []
    
    Sometimes you can improve heap use efficiency of this by designing a data structure to reify the continuations, instead of just using closures.
    The seq version proposed above is probably just as good. Since it is lazy, computations are delayed until the resulting sequence will be enumerated, and there is no danger of stack overflow. It might be significantly slower though then your #2 version for trees that do fit, because of all the extra layers it introduces.
     

    WebSharper - F#-to-JavaScript and more at http://websharper.com
    • Marked as answer by ffogd Thursday, August 04, 2011 6:36 PM
    Tuesday, August 02, 2011 2:19 PM

All replies

  • I think idiomatic F# would be to have a function which converts the Tree into a seq (which can then be saved into a list if desired):

    module Tree =
      let rec toSeq tree = seq {
        match tree with
        | Empty -> ()
        | Node (left, x, right) ->
          yield! toSeq left
          yield x
          yield! toSeq right
      }
    


    Saturday, July 30, 2011 2:43 PM
  • Is toSeq constant time function?
    Saturday, July 30, 2011 3:09 PM
  • If all you want to do is flatten a tree "best" can be whatever algorithm works fastest, but in most situations your requirements go beyond that. Are you trying to serialize a tree? In that case a simple flattening may not save enough information to deserialize back into the original tree. Or a "flattening" may be intended as a first step to processing the nodes. In that case how you flatten is very important. There are many ways to cycle through a binary tree, but the three major ways are usually referred to as pre-order, in-order, and post-order. (TAOCP 2.3.1)

    And you don't even have to flatten the tree first to do whatever serial processing you want to do on it. Here is the first in an excellent series of blogposts on catamorphisms. http://lorgonblog.wordpress.com/2008/04/05/catamorphisms-part-one/ (8 articles in all, well worth studying.)

    Here is some code I experimented with to illustrate these points (before I read about catamorphisms).

    Notes:

    1) This type of binary tree is specifically constructed to support unbalanced (missing) end nodes. In other words a node may only have one child, and it makes a difference whether it is a left or right child.

    2) Because it supports missing end nodes serializing generic type 'a is tricky. (You can still see the remains of my lame attempt to overcome the absence of nulls in F#. There really is no alternative except the Option type.)

    3) Examples of pre-, in-, and post-order flattening, as well as flattening saving additional information for reconstruction of the tree (e.g. member x.postOrderLevel).

     

    type Tree<'a> =

        | Tree of 'a * 'a Tree * Tree<'a>       //note OCaml and .Net syntax, either works

        | Node of 'a

        | BinTreeNil 

        ///indented tree, top node is left

        override this.ToString() = 

            let rec loop indent string tree = 

                match tree with

                | Tree(node, left, right) ->

                    loop (indent + "  ") 

                        (loop (indent + "  ") (string + "\n" + indent + node.ToString()) left) 

                        right

                | Node node -> 

                    string + "\n" + indent + node.ToString()

                | BinTreeNil -> 

                    string + "\n" + indent + "None"

     

            loop "" "" this

     

        static member pseudoNull = "nULL" :> System.Object :?> 'a  //note this only works when 'a is type string

     

        member x.preOrder =

            let rec loop tree =

                seq {

                    match tree with

                        | Tree(x, left, right) ->

                            yield x

                            yield! loop left

                            yield! loop right

                        | Node x -> yield x

                        | BinTreeNil -> ()

                }

            loop x

     

        member x.preOrderLevel =

            let rec loop tree level =

                seq {

                    match tree with

                        | Tree(x, left, right) ->

                            yield (x, level)

                            yield! loop left (level + 1)

                            yield! loop right (level + 1)

                        | Node x -> yield (x, level)

                        | BinTreeNil -> yield (Tree.pseudoNull, level)

                }

            loop x 0

     

        member x.inOrder =

            let rec loop tree = 

                seq {

                match tree with

                    | Tree(x, left, right) ->

                       yield! loop left

                       yield x

                       yield! loop right

                    | Node x -> yield x

                    | BinTreeNil -> ()

                } 

            loop x

     

        member x.inOrderLevel =

            let rec loop tree level = 

                seq {

                match tree with

                    | Tree(x, left, right) ->

                       yield! loop left (level + 1)

                       yield (x, level)

                       yield! loop right (level + 1)

                    | Node x -> yield (x, level)

                    | BinTreeNil -> yield (Tree.pseudoNull, level)

                } 

            loop x 0

     

        member x.postOrder =   

            let rec loop tree =     

                seq {

                  match tree with

                      | Tree(x, left, right) ->

                           yield! loop left

                           yield! loop right

                           yield x

                      | Node x -> yield x

                      | BinTreeNil -> ()

                } 

            loop x

     

        member x.postOrderLevel =            

            let rec loop tree level = 

                seq {

                match tree with

                    | Tree(x, left, right) ->

                        yield! loop left (level + 1)

                        yield! loop right (level + 1)

                        yield (x, level)

                    | Node x -> 

                        yield (x, level)

                    | _ -> yield (Tree.pseudoNull, level)

                } 

            loop x 0

     


    Jack Fox facster.com


    Saturday, July 30, 2011 3:50 PM
  • Do you really mean "constant time" and not linear time?  I find it hard to imagine a function written in Haskell that can flatten a tree of arbitrary size and whose worst case time complexity does not depend upon the size of the tree.
    Sunday, July 31, 2011 1:37 PM
  • sorry, i mean linear time.
    Sunday, July 31, 2011 1:52 PM
  • Both the functions you presented are linear time.  In each of them you only visit each Node once and you only do a constant amount of work each time you visit a Node: namely a cons and, possibly (depending on optimisations), two sequential compositions, neither of which depend on the size of their input. 

     

    As I understand it, the only inefficiency problem you might be likely to encounter whilst trying to write a flatten function is if you write the "obvious" recursive implementation which doesn't use an accumulator, i.e.

     

    let rec flattenTree3 t =
     match t with
     | Empty -> []
     | Node (l, n, r) -> flattenTree3 l @ [n] @ flattenTree3 r


    because here, although you only ever visit each node once, you are doing an amount of work which is not constant: append (@) takes an amount of time which depends on the size of its input.  Both of your functions avoid the call to append by using an accumulating parameter.

     

    Although your functions are going to be more or less the same in terms of efficiency, you might have noticed that they will not give you exactly the same result because, in the Node case of your recursive definition, the two functions differ as to when to add the label n to the list.  The first one recursively adds all the labels from the right sub-tree followed by n followed by all the labels from the left sub-tree whereas the second version adds the n as the first thing that it does.  Hence, you should see something like:

    > flattenTree1 (Node (Node (Empty, 1, Empty), 2, Node (Empty, 3, Empty)));;
    val it : int list = [1; 2; 3]
    
    > flattenTree2 (Node (Node (Empty, 1, Empty), 2, Node (Empty, 3, Empty)));;
    val it : int list = [1; 3; 2]

    Monday, August 01, 2011 11:39 AM
  • While all these algorithms are the same time complexity-wise (linear time) since loop hits every node exactly once, they will have substantial constant-factor differences because of different patterns of using the stack and the heap.

    Think of what happens in version #1. You pass a big tree, it hits the Node case of help, then it recurses on both branches immediately. This means that the whole tree will have to fit on the stack -> you should not process trees bigger than a few megabytes in size. Moreover, a lot of closures get allocated, so the heap consumption is far from optimal. Note the same probably does not hold for Haskell, since it has a different evaluation order. 

    I actually like your version #2 code:

    let flattenTree2 t =
     let rec loop tree acc =
      match tree with
      | Empty -> acc
      | Node(l, n, r) -> loop l (loop r (n :: acc)) 
     loop t []
    
    The only problem, if any, with it, is that one of the calls to loop is not tail-recursive. This means the computation happens on the stack. This is probably fine for 99% of the cases, but if you want to process trees deep enough to not fit onto the stack, this algorithm will overflow. Note how it is different from #1: in #1 the whole tree had to fit the stack, in #2 only the paths from root to leaf. A simple FP trick to completely eliminate the stack size limit is to use a continuation-passing style (CPS) to shift the accumulation to the heap:
      let flattenTree3 t =
       let rec loop tree k acc =
        match tree with
        | Empty     -> k acc
        | Node (l, n, r) -> loop r (loop l k) (n :: acc)
       loop t id []
    
    Sometimes you can improve heap use efficiency of this by designing a data structure to reify the continuations, instead of just using closures.
    The seq version proposed above is probably just as good. Since it is lazy, computations are delayed until the resulting sequence will be enumerated, and there is no danger of stack overflow. It might be significantly slower though then your #2 version for trees that do fit, because of all the extra layers it introduces.
     

    WebSharper - F#-to-JavaScript and more at http://websharper.com
    • Marked as answer by ffogd Thursday, August 04, 2011 6:36 PM
    Tuesday, August 02, 2011 2:19 PM
  • Thanks for such a detailed explanation. In addition I take a function with an explicit stack.

    Here is my test run.

    open System
    
    type KDTreeNode<'a> =
        | Empty
        | Node of KDTreeNode<'a> * 'a * KDTreeNode<'a>
        
    let inline test f t=
      printfn "Test Start"
      let sw = new System.Diagnostics.Stopwatch()
      sw.Start()
      let res=List.length (f t)
      sw.Stop()
      let time = sw.ElapsedMilliseconds
      printfn "Time Duration : %A" (time, res)
      time
      
    
    let inline insert l =
      let rec loop l acc =
        match l,acc with
        | [],_ -> acc
        | data::xs, Empty -> loop xs (Node (Empty, data, Empty))
        | data::xs, Node (left, ndata, right) ->
          if data % 2 > 0 then
            loop xs (Node (Node(left,ndata,Empty), data, right ))
          else
            loop xs (Node (left, data, Node(Empty,ndata,right)))
      loop l Empty
      
    let inline flatten4 tree =
     let s = System.Collections.Generic.Stack[tree]
     let rec loop (stack:Collections.Generic.Stack<KDTreeNode<'a>>) acc =
       match stack.Count>0 with
       | false -> acc
       | true ->
        match stack.Pop() with
        | Empty -> loop stack acc
        | Node(left, a, right) ->
          stack.Push left
          stack.Push right
          loop stack (a::acc)
     loop s []
     
    let inline flatten3 t =
      let rec loop tree k acc =
      match tree with
      | Empty   -> k acc
      | Node (l, n, r) -> loop r (loop l k) (n :: acc)
      loop t id []
    
    let inline flatten2 t =
     let rec loop tree acc =
      match tree with
      | Empty -> acc
      | Node(l, n, r) -> loop l (loop r (n :: acc)) 
     loop t []
     
    
    
    let testN f t = printfn "Total Time: %A" (List.fold (fun acc _-> acc + (test f t)) (int64 0) [0..10])
    
    // TEST with 50000
    let tree =insert [0..50000]
    
    testN flatten2 tree
    testN flatten3 tree
    testN flatten4 tree
    
    
    // TEST with 100000
    let tree1 =insert [0..100000]
    
    
    //flatten2 stack overflow
    testN flatten3 tree1
    testN flatten4 tree1;;
    
    
    // TEST with 50000
    Test Start flatten2
    Time Duration : (36L, 50001)
    Test Start
    Time Duration : (4L, 50001)
    Test Start
    Time Duration : (4L, 50001)
    Test Start
    Time Duration : (5L, 50001)
    Test Start
    Time Duration : (5L, 50001)
    Test Start
    Time Duration : (35L, 50001)
    Test Start
    Time Duration : (11L, 50001)
    Test Start
    Time Duration : (4L, 50001)
    Test Start
    Time Duration : (4L, 50001)
    Test Start
    Time Duration : (5L, 50001)
    Test Start
    Time Duration : (5L, 50001)
    Total Time: 118L
    
    Test Start flatten3
    Time Duration : (51L, 50001)
    Test Start
    Time Duration : (32L, 50001)
    Test Start
    Time Duration : (36L, 50001)
    Test Start
    Time Duration : (39L, 50001)
    Test Start
    Time Duration : (30L, 50001)
    Test Start
    Time Duration : (35L, 50001)
    Test Start
    Time Duration : (30L, 50001)
    Test Start
    Time Duration : (33L, 50001)
    Test Start
    Time Duration : (39L, 50001)
    Test Start
    Time Duration : (30L, 50001)
    Test Start
    Time Duration : (39L, 50001)
    Total Time: 394L
    
    Test Start flatten4
    Time Duration : (11L, 50001)
    Test Start
    Time Duration : (7L, 50001)
    Test Start
    Time Duration : (6L, 50001)
    Test Start
    Time Duration : (7L, 50001)
    Test Start
    Time Duration : (10L, 50001)
    Test Start
    Time Duration : (6L, 50001)
    Test Start
    Time Duration : (6L, 50001)
    Test Start
    Time Duration : (6L, 50001)
    Test Start
    Time Duration : (6L, 50001)
    Test Start
    Time Duration : (7L, 50001)
    Test Start
    Time Duration : (6L, 50001)
    Total Time: 78L
    
    // TEST with 100000
    Test Start flatten3
    Time Duration : (123L, 100001)
    Test Start
    Time Duration : (74L, 100001)
    Test Start
    Time Duration : (93L, 100001)
    Test Start
    Time Duration : (80L, 100001)
    Test Start
    Time Duration : (74L, 100001)
    Test Start
    Time Duration : (70L, 100001)
    Test Start
    Time Duration : (72L, 100001)
    Test Start
    Time Duration : (89L, 100001)
    Test Start
    Time Duration : (75L, 100001)
    Test Start
    Time Duration : (73L, 100001)
    Test Start
    Time Duration : (78L, 100001)
    Total Time: 901L
    
    Test Start flatten4
    Time Duration : (24L, 100001)
    Test Start
    Time Duration : (13L, 100001)
    Test Start
    Time Duration : (13L, 100001)
    Test Start
    Time Duration : (14L, 100001)
    Test Start
    Time Duration : (18L, 100001)
    Test Start
    Time Duration : (14L, 100001)
    Test Start
    Time Duration : (14L, 100001)
    Test Start
    Time Duration : (14L, 100001)
    Test Start
    Time Duration : (18L, 100001)
    Test Start
    Time Duration : (17L, 100001)
    Test Start
    Time Duration : (13L, 100001)
    Total Time: 172L


    Wednesday, August 03, 2011 6:16 PM