Google Multi-Language Benchmark with F#

    General discussion

  • Google recently published a paper comparing performance of multiple languages implementing the same algorithm - finding loops in a code flow graph.

    The final best performance for the various languages was as follows:

    C++ : 23 sec
    Java : 89 sec
    Scala: 58 sec
    Go    : 126 sec

    I ported the scala code over to F# and then optimized it further using VS2010 profiler (nice tool!). Here are the results for F#:

    31 sec on Core Duo 2.00 GHz
    20 sec on Core i5

    I don't know what hardware Google used but I assume that it is relatively current. However until we know the exact specs of the hardware, the results are not directly comparable.

    Nonethless F# performance was a pleasant surprise.

    The main optimization was the use of 'for' loops in performance critical regions. The algorithm uses mutable 'platform' collections so 'for' loops were a relatively straightforward optimation.

    A surprise was that using Object.Equals vs "=" and "<>" operators yielded better performance. I suppose F# is biased towards "structural equality" to make data oriented code easier to write.

    I must mention that another team at Google further optimized Go and C++ to run in less than 4 seconds! Which shows that 'system' languages allow another level of tuning that is just not possible with VM based languages. The optimizations however are too specific to the benchmark and cannot be applied generally.

    Links to the F# and Scala code and the Google paper are at this blog entry:



    Tuesday, July 12, 2011 7:22 PM

All replies

  • I compiled and ran your F# code (release, .NET 4.0 SP1, x64) and Google's reference C++ code (release, VC++ 2010 SP1, x64, WPO enabled) on the same machine. I get ~23.5 seconds for the reference C++ implementation and ~18.4 seconds for your F# implementation. Nice job. :-] (Unfortunately I don't have a recent version of MinGW installed on this computer, so I can't see how GCC fares in comparison to VC++.)

    Do you happen to have a link to the source code for the sub-4 second C++ implementation?

    Tuesday, July 12, 2011 10:42 PM
  • Here is the blog post that talks about C++ and Go improvements :

    It has links to the C++ source which is here:

    Reading this post I get the impression that when we have tight loops then even a little bit of allocation makes a big difference.

    The main improvements to go and c++ are the reduction in garbage by reusing cached objects when inside tight loops.

    The way the benchmark is written it first constructs a graph (of about 20K nodes) and then finds loops in it 50 times.

    For each loop-find iteration most implementations generate some garbage. The benchmark is very sensitive to the amount of garbage generated in each iteration.

    I think a better benchmark would have been to generate the graph and find the loops in it once and then repeat this cycle 50 times without caching anything across cycles. Recreating the graph and other objects for each cycle is a more realistic benchmark.

    The only sensible conclusion we can draw is that F# on Windows seems to be faster than Scala on Linux because the F# code is a direct port of the Scala code (I believe Google used a Core i7 processor for the benchmarks as indicated here:


    Wednesday, July 13, 2011 1:44 PM