none
Bit Shifting vs BitConverter.ToInt64 RRS feed

  • Question

  • I'm porting over some FORTRAN code into C#. They are using EQUIVALENCE between an array of two 4 byte integers and an 8 byte long. For those unfamiliar with equivalence, the array would occupy the same location in memory as the long so if you define the value of the long it defines the values of the other and vice versa. In order for me to simulate something like this in C# (without the occupying the same space in memory part) I'm trying to combine two integers into one long. I was able to replicate the resulting long value that FORTRAn produced using BitConverter.GetBytes to combine the bytes from both integers and then used BitConverter.ToInt64 to convert the byte array I constructed into a long. This matches but I don't want to use this method since it needs to be fast. I figured I could instead use bit shifting to combine the two integers but wasn't able to replicate the resulting long when I tried

    long result1 = ((long)int1) << 32 | int2)

    I got the wrong answer. I finally tried shifting the second integer and then was able to get the correct answer. This is where I am very confused. Why would the second integer need to be bit shifted and not the first?!

    int int1 = 10;
    int int2 = 20;
    
    long result1 = (int1 | ((long)int2) << 32);
    Console.WriteLine(result1);
    
    byte[] bytes = 
    { 
      BitConverter.GetBytes(int1)[0], 
      BitConverter.GetBytes(int1)[1],
      BitConverter.GetBytes(int1)[2], 
      BitConverter.GetBytes(int1)[3],
      BitConverter.GetBytes(int2)[0], 
      BitConverter.GetBytes(int2)[1],
      BitConverter.GetBytes(int2)[2], 
      BitConverter.GetBytes(int2)[3]
    };
    
    long result2 = BitConverter.ToInt64(bytes, 0); 
    Console.WriteLine(result2);




    Tuesday, June 26, 2018 4:55 PM

All replies

  • My thought is endian. Intel processors are little endian. In little the least significant is first. In your case your converting values (based upon your Fortran code) such that int2 is the least significant, hence it comes first. I suspect that your old Fortran code ran on big endian. 

    Your shift logic is implementing the little endian logic by putting the int2 first. You can see that by converting it to hex.

    Console.WriteLine("{0:x}", result1);

    BitConverter has a discussion on this as well. In your case you're creating a byte array by converting the integral value to an array of bytes and then selecting each byte into a larger array. Since you're using int1 and then int2 you end up with 0x0A followed by 0x14. Then when you pass it to the converter the converter knows it is running on little endian so the least significant value is first and hence it swaps the value on conversion.

    If all you really want is to join two 32-bit values into a 64-bit value then there are easier ways. This approach is consistent with what you were probably seeing in your (big endian) Fortran program. But I'd say you should probably just use normal bitwise OR logic and use the processors normal endianness unless you need interop with other architectures. 

    [StructLayout(LayoutKind.Explicit)]
    struct Int32ToInt64
    {
        [FieldOffset(0)]
        public int Low;
    
        [FieldOffset(4)]
        public int High;
    
        [FieldOffset(0)]
        public long LowHigh;
    }
    
    var result3 = new Int32ToInt64() { Low = 10, High = 20 };
    Console.WriteLine("{0:x}", result3.LowHigh);
    This seems to work in this case but I wouldn't go so far as to say it would work for other combinations of types. 


    Michael Taylor http://www.michaeltaylorp3.net

    Tuesday, June 26, 2018 5:40 PM
    Moderator
  • Allow me to expand on Michael's response.

    Let's say you have the hex value 0x0102030405060708.  In a little-endian system (and all Windows systems are and always have been little-endian), that's stored in memory this way:

        08 07 06 05 04 03 02 01

    The 01 byte the most-significant byte, so it's at the highest bit position.  If you had the two int values 0x01020304 and 0x05060708, you need to shift the first value up to get things in the right position.  If your equivalence block was:

        INTEGER*4 FOUR(2)
        INTEGER*8 EIGHT
        EQUIVALENCE (FOUR, EIGHT)

    If EIGHT contained the value above, then FOUR(1) will contain 05060708 and FOUR(2) will contain 01020304.

    However, if your FORTRAN code originated on an IBM mainframe, they used big-endian ordering.  In that case, that hex value 0x0102030405060708 is stored in memory this way:

        01 02 03 04 05 06 07 08

    In that case, with the EQUIVALENCE block above, FOUR(1) will contain 01020304 and FOUR(2) will contain 05060708.

    This is why you often see dire warnings about machine-specific issues when using EQUIVALENCE.


    Tim Roberts, Driver MVP Providenza & Boekelheide, Inc.

    Thursday, June 28, 2018 1:31 AM
  • Thanks for the explanation. Your suggestion of displaying the hex values really helped. Then it was obvious that BitConverter.ToInt64 was swapping the bytes on conversion. I didn't know that creating something analogous to equivalence in fortran using a structure and field offsets was so easy in C#, it really makes it a lot easier than having to go recreate other structures when one gets modified. I'll have to do some testing to see how it behaves when using fixed size arrays but it seems promising!
    Thursday, June 28, 2018 2:00 PM
  • Thanks for the explanation. I'm unsure if the FORTRAN code machine is big or little endian. I don't know if one can deduce which one it is by knowing that I was able to match it's output in the manner I described in my original post. The fortran code I was porting was calling the first INTEGER*4 using an index which contained "LS" in its name which I assume means least significant and the second INTEGER*4 using an index which contained "MS" which I assume means most significant. Therefore it seemed to imply that within the two equivalenced integers the first int was the LS word and the second was the MS word. That seems to indicate little endian. That paired with the fact that I could match the fortran long when i shifted the first integer up means that it is running in little endian as well right?
    Thursday, June 28, 2018 2:21 PM
  • Well, your post is somewhat contradictory.  Saying the first word is least significant and the second word is most significant does imply little endian, but the part you need to shift up is the MOST significant part.  Using 16-bit values, if you had LS=0x1234 and MS=0x5678, the resulting long value would be 0x56781234.

    So, there's still something weird here, but if you're getting results, perhaps it doesn't matter.


    Tim Roberts, Driver MVP Providenza & Boekelheide, Inc.

    Friday, June 29, 2018 5:41 AM