none
Want to make a bigint. Carry flag? int32*int32 -> int64? RRS feed

  • Question

  • Hello. I am interested in making a bigint that can run on the gpu.

    In order to do this, I need to handle overflow conditions, like the + operator overflowing 1 bit to the carry flag on a cpu and the * operator having a result that is twice the size as the operands.

    Is there any way to do this with C++AMP? I believe CUDA has a few intrinsics for things like this ( http://www.mpi-inf.mpg.de/~emeliyan/cuda-compiler/ ).

    The C++ AMP team has done an amazing job allowing standard C++ to be used. And standard C++ does not include any sort of carry flag or int32*int32->int64. So I suspect that the answer will be "no" and that this is an uphill battle. But it sure would be handy.

    Thanks!

    Friday, July 13, 2012 12:44 PM

Answers

  • Hi Chris, there is no intrinsic in C++ AMP that exposes the state of the carry flag. You would have to use other methods such as using 16 bit arithmetic stored in 32 bit quantities to simulate larger numbers.

    Pooja Nagpal



    Monday, July 16, 2012 10:42 PM

All replies

  • I don't think there is compiler build-in support for such thing. However, you can easily overcome this problem with simple macro

    #define MULTIPLY_I64( x, y ) (__int64)x * (__int64)y
    
    __int32 val1 = _I32_MAX;
    __int32 val2 = _I32_MAX;
    __int64 res = MULTIPLY_I64( val1, val2 );
    

    The better solution is to create template method(s) for such kind of operations.


    Best regards, Sergey

    Monday, July 16, 2012 10:07 AM
  • Sergey, I appreciate your reply.
    I am aware that I can promote the parameters before multiplying. And I think that is exactly what I will have to do.

    I was hoping for a better way.
    For example, a regular old x86 mul instruction of two 32-bit numbers stores the result as a 64-bit number (spread across two registers).

    If I promote the operands to 64-bit first then I have to 0-fill, then do a 64*64 (which, if we're on a 32-bit machine, won't be as fast). The result is capped at 64-bit, so the 0-filling was useless...trimmed off anyway. Now, maybe the optimizer will figure that out. And it might even realize that it can use the mul instruction. Optimizers can be very smart and so all this very well might happen. But I believe it would be better to be explicit about intentions than to be hopeful of the optimizer.

    Which is why I would prefer some sort of intrinsic.

    But I think you are right - that without an intrinsic, promoting the operands might be the way to go.
    I appreciate your help, Sergey.
    Monday, July 16, 2012 2:28 PM
  • If I promote the operands to 64-bit first then I have to 0-fill, then do a 64*64 (which, if we're on a 32-bit machine, won't be as fast). The result is capped at 64-bit, so the 0-filling was useless...trimmed off anyway. Now, maybe the optimizer will figure that out. And it might even realize that it can use the mul instruction. Optimizers can be very smart and so all this very well might happen. But I believe it would be better to be explicit about intentions than to be hopeful of the optimizer.

    Which is why I would prefer some sort of intrinsic.

    But I think you are right - that without an intrinsic, promoting the operands might be the way to go.
    I appreciate your help, Sergey.

    Chris,

    You're right, promoting the operands could harm performance, for sure on 32-bit machine. The only way (without an intrinsic) I sees is to handle the issue with pre-compiled definitions. Something like 

    #if defined(_WIN64) //64 bit processor
      #pragma message( "64-bit compilation type" )
      //Do 64-bit case work
    #elif defined(_WIN32) //32 bit processor
      #pragma message( "32-bit compilation type" )
      //Do 32-bit case work
    #else //other processors
      #pragma message( "Unknown compilation type" )
      //Error?
    #endif
    


    Best regards, Sergey

    Monday, July 16, 2012 5:52 PM
  • Hi Chris, there is no intrinsic in C++ AMP that exposes the state of the carry flag. You would have to use other methods such as using 16 bit arithmetic stored in 32 bit quantities to simulate larger numbers.

    Pooja Nagpal



    Monday, July 16, 2012 10:42 PM