none
Byte Array Size in LayoutKind.Explicit struct With a Pack of 1 RRS feed

  • Question

  • I have the following struct:

        [StructLayout(LayoutKind.Explicit, Pack = 1)]
    public struct RecordStructure {
    [FieldOffset(0)]
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 2)]
    public byte[] TimeOfDayHours;
    [FieldOffset(4)]
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 2)]
    public byte[] TimeOfDayMinutes;
    }
    The fact is that TimeOfDayMinutes is at FieldOffset(2), however if I do that, I receive the following exception message:

    Could not load type 'ConsoleApplication3.RecordStructure' from assembly 'ConsoleApplication3, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' because it contains an object field at offset 2 that is incorrectly aligned or overlapped by a non-object field.

    Why does the byte[] require a minimum spacing of 4 bytes?  I thought with a Pack of 1, then there is no alignment.  The one way this does work is to do it unsafe using the fixed keyword, and creating a fixed size buffer.  The problem is then I have to convert a byte* to a byte[], and any gains I realize by laying a byte[] into struct is lost with additional code and memory manipulation.  What to do?
    Thursday, August 27, 2009 7:38 PM

Answers

  • There's just no point in copying the ASCII snippet, then converting it to a string.  You can go straight from byte[] to string with Encoding.GetString().  It will be faster.

    Btw, you are not likely to notice any of these speed-ups in a real production environment.  You'll need to get the data off a disk drive or network, that's several orders of magnitude slower than any kind of code you run on the data.  If it looks faster right now it is because the data is available in the file system cache.  It won't be in production.

    Hans Passant.
    • Marked as answer by davidbitton Tuesday, September 1, 2009 8:20 PM
    Monday, August 31, 2009 12:35 PM
    Moderator

All replies

  • You are battling managed vs unmanaged structure layout.  The managed version requires 4 bytes for the array to store the reference, 8 bytes in 64-bit mode.  The [FieldLayout] attribute is relevant in the managed layout as well, odd as that might seem.  Getting object references overlapped is illegal, it would allow unrestricted access to memory.

    This shouldn't be much of a hangup in this particular case, you could just declare the fields as ushort instead of byte[].

    Hans Passant.
    Thursday, August 27, 2009 7:54 PM
    Moderator
  • Really what I need is a string and not a byte[].  My data is an ASCII encoded byte[] containing a call detail record from an Avaya phone switch.  My experience with string has been the same.  The idea is that the first two array values (for example, 49 and 48) are the hour the call was made.  49 and 48 would equate to "10" which in turn would need to Int32.Parse()'ed into an int.  I'm going for speed here.  This is in lieu of doing this:

    int.Parse(record.Substring(0, 2))
    Thursday, August 27, 2009 7:58 PM
  • Hello

    I agree on Han’s analysis. If you are sure that there are only two bytes in the struct, you can define the struct as:

    [StructLayout(LayoutKind.Explicit, Pack = 1)]
        public struct RecordStructure {
            public byte byte1;
            public byte byte2;

            // or one ushort for two bytes as Hans suggested
            public ushort TimeOfDayMinutes;
        }

    To interpret TimeOfDayHours/TimeOfDayMinutes as string,
    Are you inter-operating with native code? Please post here the definition of the native struct, and the prototype of the native API.

                StringBuilder str = new StringBuilder();
                str = str.Append((char)myStruct.byte1);
                str = str.Append((char)myStruct.byte2);
                int.Parse(str.ToString());

    A possibly better solution is to operate on pointers (IntPtr):

    string result = Marshal.PtrToStringAnsi(ptr, 2);

    ptr is a pointer to the RecordStructure. It may be returned from your native API. Marshal.PtrToStringAnsi translates the first two bytes to a Unicode string.

    Thanks,
    Jialiang Ge
    MSDN Subscriber Support in Forum
    If you have any feedback of our support, please contact msdnmg@microsoft.com.

     


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Friday, August 28, 2009 7:20 AM
    Moderator
  • The data is coming in a ASCII encoded byte[] via a TCP socket from the Avaya PBX.  An example record is:

    1031 0001 9                      8740 2015551212                            0 003     0      *801  0            0 0 0         1           0           
    The first 2 chars are the TimeOfDayHours and the 3rd and 4th chars are TimeOfDayMinutes.  This string contains 30 different bits of information about a single phone call.  Today, I'm using Substring(int, int) to get the first two chars and then passing that to Int32.Parse(string) to get the integer value 10 (which is passed into the ctor for a DateTime struct).  The whole idea is to find a super fast way to lay that string into a struct so I don't have to use many Substrings and Parse methods.
    Friday, August 28, 2009 5:04 PM
  • Hmya, there's not much you can do about the data being supplied as an ASCII string I assume.  You are going to have to convert a string to an integer, int.Parse() is the only tool at your disposal.  Treating the ASCII string as bytes doesn't buy you anything, it still requires a conversion from text to binary.  Just text in a different encoding.  The conversion effort is the same.

    The large number of small sub-strings you generate should not worry you either.  They never make it out of generation 0.  Hard to avoid creating them.  Although there's a way:

    using System;
    using System.Text;
    using System.Runtime.InteropServices;

    class Program {
        unsafe static void Main(string[] args) {
            string txt = "1031 0001 9";
            byte[] asc = Encoding.ASCII.GetBytes(txt);
            fixed (byte* ascptr = asc) {
                int num1 = strtol(ascptr+0, IntPtr.Zero, 10);
                int num2 = strtol(ascptr+5, IntPtr.Zero, 10);
                int num3 = strtol(ascptr+10, IntPtr.Zero, 10);
                Console.WriteLine("{0}, {1}, {2}", num1, num2, num3);
            }
            Console.ReadLine();
        }
        [DllImport("msvcrt.dll")]
        private unsafe static extern int strtol(byte* str, IntPtr end, int numbase);
    }

    This is as fast as you can go.  You'll should see the difference when you run your test over and over again.  You won't in real life, the time needed to read the data off the disk should swamp whatever time you burn on converting.


    Hans Passant.
    Friday, August 28, 2009 6:10 PM
    Moderator
  • Hans, thanks for the reply.  For the segments of the string that are actually string values, what is the function that I can use with the pointer to simple copy out the string value?
    Friday, August 28, 2009 6:16 PM
  • Hmm, stay away from unmanaged memory copying.  You can accomplish the same with the Encoding.ASCII.GetString(byte[], int, int) overload.  It's fast.

    Hans Passant.
    Friday, August 28, 2009 7:20 PM
    Moderator
  • I like to live dangerously!
    Friday, August 28, 2009 7:20 PM
  • Thank you, Hans, for the idea. I like it too.

     

    Apart from Hans’s suggestion, another way to avoid many substrings is to use regex. But the solution cannot avoid many int.Parse’s. For example,

     

                string txt = "1031 0001 9";           

                MatchCollection mc = Regex.Matches(txt, @"(?<D1>\d\d)(?<D2>\d\d) (?<D3>\d\d)(?<D4>\d\d)");

                foreach (Match m in mc)

                {

                    Console.WriteLine(int.Parse(m.Groups["D1"].Value));

                    Console.WriteLine(int.Parse(m.Groups["D2"].Value));

                    Console.WriteLine(int.Parse(m.Groups["D3"].Value));

                    Console.WriteLine(int.Parse(m.Groups["D4"].Value));

                }

     

    Thanks,

    Jialiang Ge

    MSDN Subscriber Support in Forum

    If you have any feedback of our support, please contact msdnmg@microsoft.com.


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Monday, August 31, 2009 10:26 AM
    Moderator
  • There's just no point in copying the ASCII snippet, then converting it to a string.  You can go straight from byte[] to string with Encoding.GetString().  It will be faster.

    Btw, you are not likely to notice any of these speed-ups in a real production environment.  You'll need to get the data off a disk drive or network, that's several orders of magnitude slower than any kind of code you run on the data.  If it looks faster right now it is because the data is available in the file system cache.  It won't be in production.

    Hans Passant.
    • Marked as answer by davidbitton Tuesday, September 1, 2009 8:20 PM
    Monday, August 31, 2009 12:35 PM
    Moderator
  • Hans,
      I see what you're saying about the performance increases becoming negligible in a production environment. Part of this exercise was to tighten up the code itself and to add some "black belt ninja" routines as well.  Using your suggestion of Encoding.GetString(), I would use ASCIIEncoding.GetString(byte[] bytes, int byteIndex, int byteCount) in lieu of the Substring.  This allows me to skip the GetString that moves the entire byte[] into a string.  As far as I see it, this precludes the buffer copy and saves memory.


    Monday, August 31, 2009 1:31 PM