locked
Implementation of int_fast8_t and uint_fast8_t in MSVC RRS feed

  • Question

  • MSVC (x86) defines the 'fast' integer types of <cstdint> as follows.

    typedef signed char        int_fast8_t;
    typedef int                int_fast16_t;
    typedef int                int_fast32_t;
    typedef long long          int_fast64_t;
    typedef unsigned char      uint_fast8_t;
    typedef unsigned int       uint_fast16_t;
    typedef unsigned int       uint_fast32_t;
    typedef unsigned long long uint_fast64_t;
    What surprises me is the definition of int_fast8_t and uint_fast8_t. The definitions imply that using a variable of type signed char or unsigned char is faster or equally fast to using int on x86 platform. This is different from what I know. Can someone comment on this.
    Monday, June 8, 2020 12:51 AM

Answers

  • > This is different from what I know.

    Why?  I curious what makes you say this.  On an x86 architecture machine, there is no difference in speed between the integral data types.  Adding two 8-bit values takes exactly as much time as adding two 32-bit values.  Fetching a byte from memory takes exactly as much time as fetching a 32-bit value.

    Now, if you're clearing a block of memory, it's clearly faster to do that 4 bytes at a time rather than 1, but that's not what the "fast" types are about.


    Tim Roberts | Driver MVP Emeritus | Providenza &amp; Boekelheide, Inc.

    • Marked as answer by Hani Deek Thursday, June 11, 2020 2:01 PM
    Monday, June 8, 2020 11:49 PM
  • It is possible that things like the instruction size override prefix is coming into this. Because of how the x86 series progressed, there is a dedicated 8 bit instruction but the 16 bit and 32 bit instructions are shared and use the 0x66 prefix to override.

    Since code segments tend to use 32 bit mode, this would mean that at the very least the 16 bit values would need an extra prefix. As an example, take the move from immediate to register instruction. As an example, mov al, 8h is encoded as b0 08.

    For 16 bit moves, mov ax, 8h is encoded as 66 b8 00 08 and for 32 bit moves this would be b8 00 00 00 08. Since both instructions share the same opcode, (0xb8), it uses the size override prefix (0x66) to differentiate. Maybe this adds a bit of extra time in decoding that the 32 bit version of the instruction doesn't have?

    I also noticed from disassembly that the 16 bit variables have less efficient codegen than 8 bit and 32 bit. For example:

    int16_t a = static_cast<int16_t>(10);

    generated two instructions where:

    int8_t a = static_cast<int8_t>(10);
    int32_t b = 10;

    generated one each. For the 16 bit case the compiler loaded the 10 into eax and then moved this from ax to the memory location of a:

    mov eax, 0ah
    mov word ptr[a], ax

    So the compiler definitely generates slower code sequences for the 16 bit cases.


    This is a signature. Any samples given are not meant to have error checking or show best practices. They are meant to just illustrate a point. I may also give inefficient code or introduce some problems to discourage copy/paste coding. This is because the major point of my posts is to aid in the learning process.

    • Marked as answer by Hani Deek Thursday, June 11, 2020 2:01 PM
    Thursday, June 11, 2020 3:42 AM
  • Hi,

    Thank you for posting here.

    >>What surprises me is the definition of int_fast8_t and uint_fast8_t. The definitions imply that using a variable of type signed char or unsigned char is faster or equally fast to using int on x86 platform.

    int_fast8_t: fastest signed integer type with width of at least 8 bits respectively.
    uint_fast8_t: fastest unsigned integer type with width of at least 8 bits respectively.

    As far as I'm concerned, signed char, unsigned char and int will all be fast (probably equally fast). It depends on the operations in the instruction set as well as the compiler.

    I suggest you could refer to the link: https://stackoverflow.com/questions/5069489/performance-of-built-in-types-char-vs-short-vs-int-vs-float-vs-double

    Best Regards,

    Jeanine Zhang


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Marked as answer by Hani Deek Thursday, June 11, 2020 2:02 PM
    Monday, June 8, 2020 2:52 AM
  • Thanks. I now remember that in the Intel optimization manual they advise against using 16-bit values. They have the following rule:

    Favor generating code using imm8 or imm32 values instead of imm16 values.
    If imm16 is needed, load equivalent imm32 into a register and use the word value in the register instead.

    The reason is that the length-changing prefix slows down the instruction 'pre-decoder'.

    • Marked as answer by Hani Deek Thursday, June 11, 2020 2:17 PM
    Thursday, June 11, 2020 2:00 PM

All replies

  • Hi,

    Thank you for posting here.

    >>What surprises me is the definition of int_fast8_t and uint_fast8_t. The definitions imply that using a variable of type signed char or unsigned char is faster or equally fast to using int on x86 platform.

    int_fast8_t: fastest signed integer type with width of at least 8 bits respectively.
    uint_fast8_t: fastest unsigned integer type with width of at least 8 bits respectively.

    As far as I'm concerned, signed char, unsigned char and int will all be fast (probably equally fast). It depends on the operations in the instruction set as well as the compiler.

    I suggest you could refer to the link: https://stackoverflow.com/questions/5069489/performance-of-built-in-types-char-vs-short-vs-int-vs-float-vs-double

    Best Regards,

    Jeanine Zhang


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Marked as answer by Hani Deek Thursday, June 11, 2020 2:02 PM
    Monday, June 8, 2020 2:52 AM
  • > This is different from what I know.

    Why?  I curious what makes you say this.  On an x86 architecture machine, there is no difference in speed between the integral data types.  Adding two 8-bit values takes exactly as much time as adding two 32-bit values.  Fetching a byte from memory takes exactly as much time as fetching a 32-bit value.

    Now, if you're clearing a block of memory, it's clearly faster to do that 4 bytes at a time rather than 1, but that's not what the "fast" types are about.


    Tim Roberts | Driver MVP Emeritus | Providenza &amp; Boekelheide, Inc.

    • Marked as answer by Hani Deek Thursday, June 11, 2020 2:01 PM
    Monday, June 8, 2020 11:49 PM
  • OK, but this raises another problem, which is why int_fast16_t is defined as int instead of short int? If the 1-byte type is as fast as the 4-byte type, then I guess that the 2-byte type should also be as fast.

    Tuesday, June 9, 2020 12:37 AM
  • Hi,

    >>which is why int_fast16_t is defined as int instead of short int? 

    "int" and "short int" may have the same size, but it is guaranteed that "int" is equal to or bigger than "short int".

    Best Regards,

    Jeanine Zhang


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Tuesday, June 9, 2020 8:48 AM
  • Hmm, I hadn't noticed that in your post.  Yes, that is odd.  int_fast16_t should be "short".  Perhaps we can get a compiler team member to respond on the justification.

    Tim Roberts | Driver MVP Emeritus | Providenza &amp; Boekelheide, Inc.

    Wednesday, June 10, 2020 6:49 AM
  • Yes, but that's the question that was asked.  It is certainly legal for an implementation to make "int" and "short" the same size, but here we are talking about one specific implementation, where we know the sizes that were chosen.

    The original poster is correct, these typedef are unusual.  It would be nice to have a compiler team rep explain them.


    Tim Roberts | Driver MVP Emeritus | Providenza &amp; Boekelheide, Inc.

    Wednesday, June 10, 2020 6:52 AM
  • It is possible that things like the instruction size override prefix is coming into this. Because of how the x86 series progressed, there is a dedicated 8 bit instruction but the 16 bit and 32 bit instructions are shared and use the 0x66 prefix to override.

    Since code segments tend to use 32 bit mode, this would mean that at the very least the 16 bit values would need an extra prefix. As an example, take the move from immediate to register instruction. As an example, mov al, 8h is encoded as b0 08.

    For 16 bit moves, mov ax, 8h is encoded as 66 b8 00 08 and for 32 bit moves this would be b8 00 00 00 08. Since both instructions share the same opcode, (0xb8), it uses the size override prefix (0x66) to differentiate. Maybe this adds a bit of extra time in decoding that the 32 bit version of the instruction doesn't have?

    I also noticed from disassembly that the 16 bit variables have less efficient codegen than 8 bit and 32 bit. For example:

    int16_t a = static_cast<int16_t>(10);

    generated two instructions where:

    int8_t a = static_cast<int8_t>(10);
    int32_t b = 10;

    generated one each. For the 16 bit case the compiler loaded the 10 into eax and then moved this from ax to the memory location of a:

    mov eax, 0ah
    mov word ptr[a], ax

    So the compiler definitely generates slower code sequences for the 16 bit cases.


    This is a signature. Any samples given are not meant to have error checking or show best practices. They are meant to just illustrate a point. I may also give inefficient code or introduce some problems to discourage copy/paste coding. This is because the major point of my posts is to aid in the learning process.

    • Marked as answer by Hani Deek Thursday, June 11, 2020 2:01 PM
    Thursday, June 11, 2020 3:42 AM
  • Thanks. I now remember that in the Intel optimization manual they advise against using 16-bit values. They have the following rule:

    Favor generating code using imm8 or imm32 values instead of imm16 values.
    If imm16 is needed, load equivalent imm32 into a register and use the word value in the register instead.

    The reason is that the length-changing prefix slows down the instruction 'pre-decoder'.

    • Marked as answer by Hani Deek Thursday, June 11, 2020 2:17 PM
    Thursday, June 11, 2020 2:00 PM
  • In the days of the 80386, we worried about things like this.  We often counted up instruction ticks, and prefixes and overrides were important.  Today, it's ridiculous.  With simultaneous execution, microcoding, speculative execution and parallel paths, there is no longer a penalty for prefixes and overrides.

    I'm going to out on a limb here and state this a fact.  There is no point in using the *fast* types for x86 coding.  Those types are designed for microprocessors and primitive CPUs, not for the near-sentient behemoth computing monsters on which Windows operates.


    Tim Roberts | Driver MVP Emeritus | Providenza &amp; Boekelheide, Inc.

    Sunday, June 14, 2020 1:20 AM