none
How do I convert a little endian used in htons() or htonl() to 32-bit chars? RRS feed

  • Question

  • Hello,

    I am trying to reverse engineer a class within the open source ntop project sub-project nDPI.

    The class considers a packet payload and uses a macro to compare two bytes (a word) to a little endian returned by htons() and htonl().

    I would like to take the input values to htons() and htonl() and convert them to their 32-bit chars value.

    It appears that htons() takes a 16-bit number in host byte order as an argument, and htonl() take a 32-bit number in host byte order as an argument.

    How do I convert a "16-bit number in host byte order" or a "32-bit number in host byte order" into a 32-bit char value?

    I'm definitely getting some wires crossed in my brain.

    Thanks,

    Matt

    Friday, July 12, 2013 7:43 PM

Answers

  • Encode32 does something similar to AtlHexEncode - takes a binary buffer and generates a string consisting of hex digits representing that binary data. The return value is simply the length in characters of the generated string.

    For example: the `0x0104` in htons(0x0104) clearly can't be hexidecimal (?).  What base-N is this?

    This question doesn't make much sense. 0x0104 is a number. It can be written in base 16 as 0x104, in base 10 as 260, in base 8 as 0404 - all these are different and equivalent representations of the same arithmetic value. This is no different than 10.0, 010.000 and 1e+01 all being different ways to write the number 10.

    htons(0x0104) returns another number - 0x0401 aka 1025 aka 02001 (assuming little-endian machine).


    Igor Tandetnik

    • Marked as answer by mbrownnyc Monday, July 15, 2013 2:04 PM
    Friday, July 12, 2013 9:01 PM

All replies

  • What is a "32-bit char" value?
    Friday, July 12, 2013 8:13 PM
  • Sorry Scott.  I think that should be a red flag that I am not a developer, but I am trying to hack away at a conversion from the ntop source for input to another network flow analysis program.

    The developer of that program refers to the following returned value (char *str) as "32-bit chars."

    static char basis_16[] = "0123456789ABCDEF";
    
    int
    Encode32 (struct ParserStruct *parser, const char *ptr, int len, char *str, int slen)
    {
       int retn = 0, i;
       u_char *buf = (u_char *) str;
       unsigned newlen;
    
       if (ptr && ((newlen = (((len + 1) & ~0x01) * 2)) < slen)) {
          for (i = 0; i < len; i++) {
             *buf++ = basis_16[((ptr[i] & 0xF0) >> 4)];
             *buf++ = basis_16[((ptr[i] & 0x0F))];
          }
    
          retn = newlen;
       }
    
       return (retn);
    }

    This is essentially a mutual data format that I wish to convert the above bytes.

    I don't think I understand what base-N the given numbers are in:

    For example: the `0x0104` in htons(0x0104) clearly can't be hexidecimal (?).  What base-N is this?

    Any assistance is appreciated.

    Thanks,

    Matt




    • Edited by mbrownnyc Tuesday, July 16, 2013 6:10 PM char *str
    Friday, July 12, 2013 8:20 PM
  • Encode32 does something similar to AtlHexEncode - takes a binary buffer and generates a string consisting of hex digits representing that binary data. The return value is simply the length in characters of the generated string.

    For example: the `0x0104` in htons(0x0104) clearly can't be hexidecimal (?).  What base-N is this?

    This question doesn't make much sense. 0x0104 is a number. It can be written in base 16 as 0x104, in base 10 as 260, in base 8 as 0404 - all these are different and equivalent representations of the same arithmetic value. This is no different than 10.0, 010.000 and 1e+01 all being different ways to write the number 10.

    htons(0x0104) returns another number - 0x0401 aka 1025 aka 02001 (assuming little-endian machine).


    Igor Tandetnik

    • Marked as answer by mbrownnyc Monday, July 15, 2013 2:04 PM
    Friday, July 12, 2013 9:01 PM
  • Igor,

    So, what you are saying is that I do not need to convert any of the given values, unless they are not in hexidecimal/base16?

    For instance, if I input 0x0104 into Encode32(), I would receive back the string "00000104"?

    Or, if I input the decimal 123, I would receive back the string "0000007b".

    If I take a value that serves as input to htonl(), I would simply be handing the 32-bit conversion the same way: Input of 0xea070aed, would receive back the string "ea070aed".

    Does this sound correct?  I'm basically asking a question that's already answered.  Conversion of 16-bit values (those converted with htons()) and 32-bit values (those converted with htonl()) are already expressed in hexidecimal in this instance, for which the output of Encode32() is also expressed in a string value derived from the hexidecimal value.  Therefore, the only conversion that needs to take place is from the given hex value to a string representation of that hex value, as described previously in this reply?

    Thanks for your time,

    Matt




    • Edited by mbrownnyc Monday, July 15, 2013 4:05 PM 1234
    Monday, July 15, 2013 2:49 PM
  • So, what you are saying is that I do not need to convert any of the given values, unless they are not in hexidecimal/base16?

    This statement, again, doesn't make much sense. It is meaningless to say that a numeric value is in base 16, or any other base. Only its human-readable representation - written out on paper, displayed on the screen, stored in a text file - can be in base 16 or base 10 or some other base.

    For instance, if I input 0x0104 into Encode32(), I would receive back the string "00000104"?

    On a big-endian machine, yes. On little-endian machine, you'll get "04010000". Encode32 doesn't know that the block of memory pointed to by ptr should be interpreted as an int. In essence, it takes each individual byte, treats it as a numeric value in 0-255 (aka 0x0-0xFF) range, and generates two hex digits that represent this numeric value in base 16. Because it works with individual bytes, its output depends on the details of any given machine's in-memory representation.

    Or, if I input 123, I would receive back the string "0000007b".

    Well, "0000007B" to be precise (the function uses capital letters for hex digits A through F), on a big-endian machine. But most likely, you are running on little-endian machine, in which case the function will produce "7B000000".

    If I take a value that serves as input to htonl(), I would simply be handing the 32-bit conversion the same way: Input of 0xea070aed, would receive back the string "ea070aed".

    I don't understand this example. Too many moving parts. Could you explain it step by step, or better still, show a fragment of code implementing this?

    Does this sound correct?  I'm basically asking a question that's already answered.  Conversion of 16-bit values (those converted with htons()) and 32-bit values (those converted with htonl()) are already expressed in hexidecimal in this instance

    Once again, you are thinking about it wrong. Values cannot be hexadecimal - only their human-readable representation can be.

    for which the output of Encode32() is also expressed in a string value derived from the hexidecimal value.

    The output of Encode32 is a string consisting of a hexadecimal representation of each individual byte of a memory buffer passed as a parameter. The function doesn't at all care about the meaning you wish to assign to those bytes - e.g. whether or not you take them to represent in memory a value of an integer, whether in little-endian or big-endian format.

    What is the ultimate goal of this dancing around htons, htonl and Encode32? It might be easier to help you if you take a step back and explain what it is you are trying to achieve here.


    Igor Tandetnik

    Monday, July 15, 2013 4:11 PM
  • Thanks very much Igor.

    I am attempting to reverse engineer classes that are included with an open source project, ntop's nDPI.  These classes contain "protocol classification" of byte patterns.

    An example is ndpi_search_afp() (from afc.c) which defines byte patterns for an Apple File Protocol occurrence.

    Within this class, an example of the conditional is:

    #define get_u_int16_t (X,O)  (*(u_int16_t *)(((u_int8_t *)X) + O))
    
    if (get_u_int16_t(packet->payload, 0) == htons(0x0004)) {
    //do something
    return;
    })

    From reading, I can see that htons() takes in a data value as little endian, not big endian.  So, in this case, 0x0004 is little endian [I assume this because I run this software on a little endian machine where this code works correctly].

    I am using network packet flow software called argus.  This software includes a client called raservices that read and classifies data by using a configuration file with weighted classification data (see here).  An example line from a configuration file is:

    Service: http            tcp port 80   n = 257580 src = "474554202F                      "  dst = "485454502F312E  20              "

    where each string [denoted as "src" or "dst"] is the output of Encode32() (in this case, each string output should total 16 bytes) (n is the weight).

    So I am attempting to take the byte patterns defined in the nDPI classes and create the configuration file for the argus-client raservices.

    I can confirm with the developer, but I do believe that configuration file data is expressed in little-endian.


    Thanks for being abstract about the fact that the method of expressing data is irrelevant. I am thinking too directly about the result.

    As it stands now, I want to create byte patterns that are the result of Encode32(), assuming little endian as the source (because they must be little endian if they are input to `h to n` functions, right?).


    Thanks very much for your help,

    Matt







    • Edited by mbrownnyc Tuesday, July 16, 2013 6:21 PM 1234
    Monday, July 15, 2013 4:33 PM
  • #define get_u_int16_t (X,O)  (*(u_int16_t *)(((u_int8_t *)X) + O))
    
    if (get_u_int16_t(packet->payload, 0) == htons(0x0004)) {
    //do something
    return;
    })

    The macro get_u_int16_t takes a pointer and an offset that together determine a memory address. It then takes to bytes at that address, and assumes them to be a 16-bit unsigned integer value in machine-native endianness.

    The condition assumes that the value at offset 0 in packet->payload buffer is a 16-bit value in network (big-endian) representation. It compares this value to the value that 0x4 would become once it's converted from host to network order (this conversion is a no-op on big-endian machine; on little-endian machine, it swaps bytes around, e.g. turning 0x0004 into 0x0400). In other words, it takes two values, makes sure both are in network order (the value extracted from memory is presumed to be this way; the constant 0x0004 is explicitly converted), then compares them.

    The condition might be easier to comprehend if it were written in this - equivalent - form:

    if (ntohs(get_u_int16_t(packet->payload, 0)) == 0x0004)

    This does the same thing, except that it first normalizes both values to host order, rather than network order, before the comparison. The value in memory is presumed to be in network order, and so has to be converted. The value of the constant is already in host order, by definition.

    From reading, I can see that htons() takes in a data value as little endian, not big endian.

    To be precise, htons assumes its parameter to be in host order - that is, little-endian on little-endian machine, big-endian on big-endian machine. It returns this value converted to network order, which is always big-endian.

    So, in this case, 0x0004 is little endian.

    Assuming you are running a little-endian machine.

    I am using network packet flow software called argus.  This software includes a client called raservices that read and classifies data by using a configuration files (see here).  An example line from the configuration file is:

    Service: http            tcp port 80   n = 257580 src = "474554202F                      "  dst = "485454502F312E  20              "

    where each string [denoted as "src" or "dst"] is the output of Encode32() (in this case, each string output should total 16 bytes).

    Again, Encode32 just dumps raw bytes, in whatever order they happen to be laid out in memory.

    So I am attempting to take the byte patterns defined in the nDPI classes and create the configuration file for the argus-client raservices.

    Then most likely, you need to figure out what the data is supposed to look like in network (big-endian) order. In the example you gave, the first two bytes are expected to be "0004".

    I can confirm with the developer, but I do believe that configuration file data is expressed in little-endian.

    This sounds exceedingly unlikely. It is meaningless to talk about endianness of 16-byte large binary buffers: they are just that, sequences of 16 bytes. It only makes sense to talk about endianness of in-memory representation of integers - basically, whether the first byte (one at the lowest address) contains (x % 0x100) or (x / 0x1000000 % 0x100) (assuming x is a 32-bit integer).

    For example, if you have a sequence of four bytes presumed to be in network order, what these bytes should look like in little-endian order depends entirely on the meaning of those bytes. If those four bytes are supposed to represent four ASCII characters or four independent 8-bit integers, their order should remain unchanged. If they are supposed to represent a single 32-bit integer, then you should swap 0th with 3rd and 1st with 2nd bytes. If they are supposed to represent two 16-bit integers side by side, then you should swap 0th with 1st and 2nd with 3rd.

    A packet sniffing tool is unlikely to have sufficient semantic information (knowledge of the meaning of the packets flying by) to correctly convert them from network order to little-endian host order. It is much more likely to look at raw bytes, in whatever order they come off the wire.


    Igor Tandetnik

    Monday, July 15, 2013 5:09 PM
  • Thanks Igor.  I do believe your responses contain the entire answer to my question(s), so I appreciate the back and forth.

    I have posed a leading question to the developer and expect that he will like the direction I'm attempting to move and assist from his grand perspective more directly with his skills and project knowledge... specifically how the source data is encoded and how it is treated by another client program called rauserdata that parses existing stored binary data created with an argus-client.  There are at least two big "hops" between the wire and the raservices() client.

    Thanks again!

    Matt




    • Edited by mbrownnyc Monday, July 15, 2013 5:37 PM 5678
    Monday, July 15, 2013 5:24 PM