none
How can I force C# to use ymm (or zmm) registers and their instructions set ? RRS feed

  • Question

  • This is what I got if I disassemble my application:

    But my processor is Intel AVX capability (ymm = 256-bit registers) and the 10th gen Intel CPU come with Intel AVX-512 (zmm = 512-bit registers) instruction set for the first time since 2013.

    So why does C# compiler still uses 128-bit registers/instruction set aka Intel SSE.






    • Edited by Dilly0 Friday, January 17, 2020 12:03 AM
    Thursday, January 16, 2020 11:57 PM

Answers

  • There are 3 possible reasons that I can think of.

    1) The JIT compiler is not updated to use these instruction sets (which is not true because from internet source seems at least System.Numerics namespace uses it).

    2) Since the relevent part of CPU is normally not switched on in order to save power, it would naturally incur some "warm up" time before these instruction sets are executed the first time in a while. During the "warm up" period it would be about 4.5 times slower than normal. The JIT may decide it would not be worthwhile to generate instructions in these instruction sets.

    3) Maybe it requires the use of specific types in order to tell the JIT to generate such instructions.

    ======

    Silly me, there is 4th possible reason:

    4) You're not using "Release" mode so the complier is not doing optimization.


    • Edited by cheong00Editor Friday, January 17, 2020 1:35 AM
    • Marked as answer by Dilly0 Friday, January 17, 2020 3:10 AM
    Friday, January 17, 2020 1:32 AM
    Answerer

All replies

  • There are 3 possible reasons that I can think of.

    1) The JIT compiler is not updated to use these instruction sets (which is not true because from internet source seems at least System.Numerics namespace uses it).

    2) Since the relevent part of CPU is normally not switched on in order to save power, it would naturally incur some "warm up" time before these instruction sets are executed the first time in a while. During the "warm up" period it would be about 4.5 times slower than normal. The JIT may decide it would not be worthwhile to generate instructions in these instruction sets.

    3) Maybe it requires the use of specific types in order to tell the JIT to generate such instructions.

    ======

    Silly me, there is 4th possible reason:

    4) You're not using "Release" mode so the complier is not doing optimization.


    • Edited by cheong00Editor Friday, January 17, 2020 1:35 AM
    • Marked as answer by Dilly0 Friday, January 17, 2020 3:10 AM
    Friday, January 17, 2020 1:32 AM
    Answerer
  • The 3rd reason is the one, thanks.

    Here's the code:

    Single[] a = new Single[8] { 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f };
    Single[] b = new Single[8] { 0.1f, 0.2f, 0.3f, 0.4f, 0.5f, 0.6f, 0.7f, 0.8f };
    Single[] c = new Single[8];
    
    Vector<Single> res = new Vector<Single>(a) + new Vector<Single>(b);
    
    res.CopyTo(c, 0);


    That display:

    vaddps is used as we can see, now I wonder if it's working for zmm register, I don't own a 10th gen Intel CPU, do you ? if yes, can you test if C# can use zmm register.

    The test code should looks like:

    Single[] a = new Single[16] { 1.0f , 2.0f , 3.0f , 4.0f , 5.0f , 6.0f , 7.0f , 8.0f , 9.0f , 10.0f, 11.0f, 12.0f, 13.0f, 14.0f, 15.0f, 16.0f};
    Single[] b = new Single[16] { 0.01f, 0.02f, 0.03f, 0.04f, 0.05f, 0.06f, 0.07f, 0.08f, 0.09f, 0.10f, 0.11f, 0.12f, 0.13f, 0.14f, 0.15f, 0.16f};
    Single[] c = new Single[16];
    
    Vector<Single> res = new Vector<Single>(a) + new Vector<Single>(b);
    
    res.CopyTo(c, 0);

    Else either it's in Debug or not, the compiler output AVX instructions.

    • Edited by Dilly0 Friday, January 17, 2020 3:12 AM
    Friday, January 17, 2020 3:05 AM
  • I'm using a pretty old Win7 machine so cannot test for that.

    However for now I think zmm is not used, because while I can find reference for ymm in source code, there aren't any reference for zmm.

    Friday, January 17, 2020 3:56 AM
    Answerer