none
convert hex to int: Switching Endian-ness? RRS feed

  • Question

  •  

    Basically, in this project im doing, I need to read 4 bytes of a file, and translate these bytes into an int, to be used in a loop later on.

    I keep on getting weird results.

     

    The 4 bytes in my test file are 58 08 00 00. This should result in the final int being 2136, but it results in 1244548. I thought this has to do with Endian-ness, so quickly wrote this snippit:

     

    Code Snippet

    char* SwitchEndian(char* inString)

    {

    char *outString = inString;

    int numChars;

    numChars = sizeof(inString);

    for(int i=0; i<numChars; i++)

    {

    outString[numChars-i] = inString[i];

    }

    return outString;

    }

     

    Didn't work. Actually got corrupted information on the string that i passed onto this function, when cout'ing it.

     

    Not too sure what to do...

    Tuesday, April 15, 2008 2:17 AM

Answers

  •  

    Looks like you're showing hex values here:

    58 08 00 00

     

    You are looking at an endian issue.

     

    See if this example gives you any ideas.

    Build it as a console app.

     

    Code Snippet

     

    #include "stdafx.h"

    struct bytes {

    char b1;

    char b2;

    char b3;

    char b4;

    };

    union cvrt{

    bytes b;

    int mynewint;

    };

    int _tmain(int argc, _TCHAR* argv[])

    {

    cvrt cv;

    cv.b.b1 = 0x58;

    cv.b.b2 = 0x08;

    cv.b.b3 = 0x00;

    cv.b.b4 = 0x00;

    printf("%d\n", cv.mynewint);

    return 0;

    }

     

     

     

     

    - Wayne

     

    Tuesday, April 15, 2008 4:04 AM
  • Ignoring the actual question regarding endianness, which Wayne has addressed, there are two issues that stand out in your code.

     

    The first is 'numChars = sizeof(inString);'. This will set numChars to the size of a char* (which is always 4 on a 32-bit computer or 8 on a 64-bit computer) rather than the length of the string (which I assume is what you actually want).

     

    The second is 'outString[ numChars - i ] = inString[ i ];' -- numChars-i will evaluate to one past the end of your char* on the first iteration. Say your string length was 10, this gives a range of [0..9]; however on the first iteration 10-0 == 10, so you would index into outString[10], causing a buffer overflow.

     

    Again, this would all be much simpler if you took a C++ approach rather than a C approach. For example:

    Code Snippet

    std::string SwitchEndian(const std::string& inString)
    {
      return std::string(inString.rbegin(), inString.rend());
    }

     

     

    If you want to stick with a C approach, I would do something like the following:

    Code Snippet

    char* SwitchEndian(const char* inString)
    {
      if (!inString)
        return 0;

     

      const size_t len = strlen(inString);
      char* outString = new char[len + 1];
      char* pOut = outString;
      for (const char* pIn = inString + len - 1; pIn >= inString;)
        *pOut++ = *pIn--;
      *pOut = '\0';

      return outString;
    }

     

     

    Tuesday, April 15, 2008 4:36 AM
  • Now that the situation is a bit clearer (i.e., you ultimately want the result to be an int, inString isn't null-terminated as I assumed it was, and the char* passed to inString is stack allocated), I would do this:

    Code Snippet

    int SwitchEndian(const char (& inString)[4])
    {
      const unsigned long val =

        *reinterpret_cast<const unsigned long*>(inString);
      return _byteswap_ulong(val);
    }

     

     

    It's worth noting that the code you had contained a fundamental flaw: you were casting a char* to an int, which results in an integer representation of the memory address pointed to by the pointer, rather than the string value of the char* treated as an int.

     

     EscapingTheSilence wrote:

    One thing... sorry for making thread after thread after thread... should I just have all my questions in one thread? Or create seperate ones as new problems arise?

    Seperate threads as you have been is ideal. Different threads for different questions Smile
    Tuesday, April 15, 2008 5:22 PM
  • As I said in my last post, I (and that code) assumed your string was null terminated, which it isn't.

     

    Just to clarify, you are building using Visual Studio, on an x86 or x64 computer, correct? The following prints 2136 for me, does it for you as well? If it doesn't, then you will need the reverse of all the code I have been posting Stick out tongue

    Code Snippet

    int main()
    {
      char NumNamesChar[] = { 0x58, 0x08, 0, 0 };
      int NumNames = *reinterpret_cast<int*>(NumNamesChar);
      std::cout << NumNames << std::endl;
      return 0;
    }

     

     

    Tuesday, April 15, 2008 6:52 PM
  • For some reason I had it stuck in my head that you were converting from big-endian to little-endian. That's what I get for looking more at the code posted and less at the actual question posed. Stick out tongue

     

    Your data already has the proper endian-ness, no need to switch it. (So the 1476919296 returned from my code was "correct", as that's 00 00 08 58 treated as little-endian Stick out tongue)

     

    All that said, littering your code with reinterpret_casts is probably not a good idea, as you're basically removing all hint of error-checking from the compiler ("treat this variable as this other type regardless of how incorrect that may seem"), so it would probably be a good idea to refactor out that sort of thing into self-describing functions, e.g.:

    Code Snippet

    short CharsToPrimitive(const char (& inString)[2])
    {
      const short val = *reinterpret_cast<const short*>(inString);
      return val;
    }

     

    int CharsToPrimitive(const char (& inString)[4])
    {
      const int val = *reinterpret_cast<const int*>(inString);
      return val;
    }

     

    __int64 CharsToPrimitive(const char (& inString)[8])
    {
      const __int64 val = *reinterpret_cast<const __int64*>(inString);
      return val;
    }

     

     

    Tuesday, April 15, 2008 7:18 PM

All replies

  •  

    Looks like you're showing hex values here:

    58 08 00 00

     

    You are looking at an endian issue.

     

    See if this example gives you any ideas.

    Build it as a console app.

     

    Code Snippet

     

    #include "stdafx.h"

    struct bytes {

    char b1;

    char b2;

    char b3;

    char b4;

    };

    union cvrt{

    bytes b;

    int mynewint;

    };

    int _tmain(int argc, _TCHAR* argv[])

    {

    cvrt cv;

    cv.b.b1 = 0x58;

    cv.b.b2 = 0x08;

    cv.b.b3 = 0x00;

    cv.b.b4 = 0x00;

    printf("%d\n", cv.mynewint);

    return 0;

    }

     

     

     

     

    - Wayne

     

    Tuesday, April 15, 2008 4:04 AM
  • Ignoring the actual question regarding endianness, which Wayne has addressed, there are two issues that stand out in your code.

     

    The first is 'numChars = sizeof(inString);'. This will set numChars to the size of a char* (which is always 4 on a 32-bit computer or 8 on a 64-bit computer) rather than the length of the string (which I assume is what you actually want).

     

    The second is 'outString[ numChars - i ] = inString[ i ];' -- numChars-i will evaluate to one past the end of your char* on the first iteration. Say your string length was 10, this gives a range of [0..9]; however on the first iteration 10-0 == 10, so you would index into outString[10], causing a buffer overflow.

     

    Again, this would all be much simpler if you took a C++ approach rather than a C approach. For example:

    Code Snippet

    std::string SwitchEndian(const std::string& inString)
    {
      return std::string(inString.rbegin(), inString.rend());
    }

     

     

    If you want to stick with a C approach, I would do something like the following:

    Code Snippet

    char* SwitchEndian(const char* inString)
    {
      if (!inString)
        return 0;

     

      const size_t len = strlen(inString);
      char* outString = new char[len + 1];
      char* pOut = outString;
      for (const char* pIn = inString + len - 1; pIn >= inString;)
        *pOut++ = *pIn--;
      *pOut = '\0';

      return outString;
    }

     

     

    Tuesday, April 15, 2008 4:36 AM
  • @Wayne:

    Yup, thats why the Subject is converting hex to int Wink

     

    One question I have is how could this be made dynamic? Dynamic meaning what if i wanted to convery 6 bytes, or even 2 bytes? Would it be possible?

     

    @ildjarn:

    Heh, seems my ways of thinking are more suited for C!

    Anyway, what does your first function do? One problem is that its in and out vars are string, and get(outvar,size)'s var (which i use to get those bytes) uses char. Here is my full script so far:

     

    Code Snippet

     

    #include "stdafx.h"

    #include <fstream>

    #include <iostream>

    #include <string>

     

    void ProcessFile(std::fstream& inFile);

    char* SwitchEndian(const char* inString);

     

    int main ( int argc, char* argv[] )

    {

    std::string FileName;

    if ( argc != 2 ) // if there are no arguments supplied by command line

    {

    std::cout << "\nPlease enter name of file to process: ";

    std::cin >> FileName;

    }

    else //otherwise use the first argument as the input file.

    {

    FileName += argv[1];

    }

     

    std::fstream File(FileName.c_str(),std::ios::in | std::ios::binary); //open said file

    if ( !File.is_open() ) //if file not found / in use

    {

    std::cout << "\n" << FileName << " could not be opened. Check presence of file, and try again." << std::endl;

    return 1; //return as error to the OS

    }

    else //otherwise begin to process file.

    {

    std::cout << "\nFile \'" << FileName << "\' opened successfully" << std::endl;

    ProcessFile(File);

    }

     

    File.close();

    return 0; //we are done. no error returned.

    }

     

    void ProcessFile(std::fstream& inFile)

    {

    char TableLoc[4] = {};

    char NumNamesChar[4] = {};

    int NumNames = 0;

     

    inFile.seekg(12);

    inFile.get(NumNamesChar,4); //number of names in the NameTable

    inFile.seekg(4);

    inFile.get(TableLoc,4); //where the NameTable starts

     

    std::cout << int(SwitchEndian(NumNamesChar));

    }

     

    One thing... sorry for making thread after thread after thread... should I just have all my questions in one thread? Or create seperate ones as new problems arise?
    Tuesday, April 15, 2008 4:37 PM
  • Now that the situation is a bit clearer (i.e., you ultimately want the result to be an int, inString isn't null-terminated as I assumed it was, and the char* passed to inString is stack allocated), I would do this:

    Code Snippet

    int SwitchEndian(const char (& inString)[4])
    {
      const unsigned long val =

        *reinterpret_cast<const unsigned long*>(inString);
      return _byteswap_ulong(val);
    }

     

     

    It's worth noting that the code you had contained a fundamental flaw: you were casting a char* to an int, which results in an integer representation of the memory address pointed to by the pointer, rather than the string value of the char* treated as an int.

     

     EscapingTheSilence wrote:

    One thing... sorry for making thread after thread after thread... should I just have all my questions in one thread? Or create seperate ones as new problems arise?

    Seperate threads as you have been is ideal. Different threads for different questions Smile
    Tuesday, April 15, 2008 5:22 PM
  • Ooooi, so many new terms and conditions! Smile

     

    I'll give that a try. Also I monkeyed around with the C code you gave me before, and this code seems to do the same...

     

    Code Snippet

    char SwitchEndian(char* inString)

    {

    const int len = strlen(inString);

    char* outString = new char[len];


    for
    (int i=0; i<len; i++)

    {

    outString[i] = inString[(len-i)-1];

    }

    return *outString;

    }

     

    Wouldn't that be better, as the code is clearer (to my eyes) and less battle for the buck? Using cout's to test, the results were exactly the same.

     

    EDIT: Heh, Im sorry but your code did NOT work... the result was 1476919296, should have been 2136. Remember the bytes i want to convert are 58 08 00 00. Would this be better if we conversed over MSN?

    Basically, i need to switch the bytes around (arriving at 00 00 08 58) which the C code above does, then convert 858 to int. Same as if you opened up windows calc, set it to hex input, type in 858, then select dec mode, thus converting it to 2136.

    I know that it DOES swap the byteorder, thru doing tests with cout.

    Also, len ends up being 2, not 4 as it should be. Is this due to two bytes being 00? As the code is above, the int returned is 8 Tongue Tied

    Tuesday, April 15, 2008 5:49 PM
  • As I said in my last post, I (and that code) assumed your string was null terminated, which it isn't.

     

    Just to clarify, you are building using Visual Studio, on an x86 or x64 computer, correct? The following prints 2136 for me, does it for you as well? If it doesn't, then you will need the reverse of all the code I have been posting Stick out tongue

    Code Snippet

    int main()
    {
      char NumNamesChar[] = { 0x58, 0x08, 0, 0 };
      int NumNames = *reinterpret_cast<int*>(NumNamesChar);
      std::cout << NumNames << std::endl;
      return 0;
    }

     

     

    Tuesday, April 15, 2008 6:52 PM
  • Ah... your right Smile My apologies!

     

    Yes, that code does resolve to 2136. And you just solved it heh. all that battle for nothing.

    I didn't know about reinterpret_cast<>();

    Tuesday, April 15, 2008 7:07 PM
  •  

    Quote>what if i wanted to convery 6 bytes, or even 2 bytes?

     

    You'll have to be more specific. Convert to what?

    An int has a fixed size depending on the platform/implementation.

    In the words of the ANSI/ISO C++ Standard re "Fundamental types":

     

    "Plain ints have the natural size suggested by the architecture of the

    execution environment." (N. 39: "that is, large enough to contain any

    value in the range of INT_MIN and INT_MAX, as defined in the header

    <climits>."

     

    On Win32 the size of an int is 32 bits or 4 chars/bytes.

    On Win16 the size of an int is 16 bits or 2 chars/bytes.

    (Guess what size an int is on Win64.)

     

    You can't cram 6 bytes into a 32-bit int. (6x8 = 48)

    You would have to go to a larger type such as long long,

    __int64, etc.

     

    You can put 2 bytes into an int of the same or greater size,

    such as __int16 or short, __int32 or int, etc.

     

    - Wayne

     

     

     

     

    Tuesday, April 15, 2008 7:08 PM
  •  

    Sorry Wayne, wrong choice of words. I meant switch endian order. Somewhat moot point now, but I'm sure some would be looking for that. Heck, I may even need it later Smile
    Tuesday, April 15, 2008 7:14 PM
  • For some reason I had it stuck in my head that you were converting from big-endian to little-endian. That's what I get for looking more at the code posted and less at the actual question posed. Stick out tongue

     

    Your data already has the proper endian-ness, no need to switch it. (So the 1476919296 returned from my code was "correct", as that's 00 00 08 58 treated as little-endian Stick out tongue)

     

    All that said, littering your code with reinterpret_casts is probably not a good idea, as you're basically removing all hint of error-checking from the compiler ("treat this variable as this other type regardless of how incorrect that may seem"), so it would probably be a good idea to refactor out that sort of thing into self-describing functions, e.g.:

    Code Snippet

    short CharsToPrimitive(const char (& inString)[2])
    {
      const short val = *reinterpret_cast<const short*>(inString);
      return val;
    }

     

    int CharsToPrimitive(const char (& inString)[4])
    {
      const int val = *reinterpret_cast<const int*>(inString);
      return val;
    }

     

    __int64 CharsToPrimitive(const char (& inString)[8])
    {
      const __int64 val = *reinterpret_cast<const __int64*>(inString);
      return val;
    }

     

     

    Tuesday, April 15, 2008 7:18 PM
  • Heh, I think the problem was with me, as if I recall correctly, I mentioned that conversion. What is that called, switching the byte order around???

     

    OK so with this code you posted, this is called function overloading correct? I guess I should stop contrasting C++ with UnrealScript, but that would be a big nono in US Smile

     

    Tuesday, April 15, 2008 7:27 PM
  •  EscapingTheSilence wrote:

    What is that called, switching the byte order around???

     

    It's called byte swapping (there's an x86 assembly instruction called BSWAP). Typical functions/intrinsics that do this are thus derived from the names bswap or byteswap.

     

     EscapingTheSilence wrote:

    OK so with this code you posted, this is called function overloading correct?

     

    Yes, overloading is the term (not to be confused with overriding, which is applicable to virtual methods in a class hierarchy). Note that you can overload on parameter types, but not exclusively on the return type.

    Tuesday, April 15, 2008 7:49 PM
  •  

    Yes, overriding I am familiar with. I doubt that it is the same as how I know it tho.

     

    In UScript, classes are defined with a header "class <classname> extends <parentclass>".

    Variables are passed down the line, without need of extern, directly. For example, lets say there is a class called Actor. It contains the variable ItemName. Now we create class MyActor, with parentclass as Actor. Within MyActor, we can access ItemName without having to declare it, because its in the parent class. If ItemName was given a value in Actor, and MyActor didn't modify that value, the value would remain the same. The same can be done with functions. In reguards to functions, you can choose to include the previous version, and choose when the parent class's version's commands are executed (by calling super.<functionname>(variables); within your redefined function).

     

    Is this the same in C++?

    Tuesday, April 15, 2008 8:04 PM
  •  

    Quote>Typical functions/intrinsics that do this are thus derived from the names bswap or byteswap.

     

    See also _swab() in the C run-time library.

     

    - Wayne

     

    Tuesday, April 15, 2008 8:05 PM
  •  EscapingTheSilence wrote:

    In UScript, classes are defined with a header "class <classname> extends <parentclass>".

    Variables are passed down the line, without need of extern, directly. For example, lets say there is a class called Actor. It contains the variable ItemName. Now we create class MyActor, with parentclass as Actor. Within MyActor, we can access ItemName without having to declare it, because its in the parent class. If ItemName was given a value in Actor, and MyActor didn't modify that value, the value would remain the same.

     

    Yes, as long as the variable was declared with public or protected access level in the parent class.

     

     EscapingTheSilence wrote:

    The same can be done with functions. In reguards to functions, you can choose to include the previous version, and choose when the parent class's version's commands are executed (by calling super.<functionname>(variables); within your redefined function).

     

    Yes; although the syntax is a bit different, the concept is the same. Also, the method must be declared as virtual in the parent class.

     

    EDIT: If you're interested in learning a bit more about C++, my personal favorite introductory book is available for free online from the author's website. Thinking in C++ 2nd Edition by Bruce Eckel, download site is here: http://www.mindview.net/Books/DownloadSites

    Tuesday, April 15, 2008 8:13 PM
  •  

    Thanks! Downloaded Smile

     

    Another question... how would I define variables so that they can be used in any function without having to pass them in the syntax? Would I use a prototype, the same for defining new functions? ie after my includes, do:

     

    Code Snippet

    int myInt;

     

    ...thus making this a (global?) variable accessable from anywhere in the current class?
    Tuesday, April 15, 2008 8:31 PM
  •  ildjarn wrote:
    Code Snippet

    int main()
    {
      char NumNamesChar[] = { 0x58, 0x08, 0, 0 };
      int NumNames = *reinterpret_cast(NumNamesChar);
      std::cout << NumNames << std::endl;
      return 0;
    }

     

     

     

    When i had compiled this earlier, there were no complaints about the array NumNamesChar, and if i take this and past it into my code, there is still no complaints, but adding this:

     

    Code Snippet

    char VerifyCheck[] = { 0xC1, 0x83, 0x2A, 0x9E };

     

    ... which verifies that this file is what it should be, gives 3 truncation of constant value errors.

    I see no differences... except that the last two bytes are not plain 0...

    Tuesday, April 15, 2008 9:28 PM
  • The compiler is giving warnings, not errors, and in this case it is safe to ignore the warnings.

     

    The reason for the warnings is because char, by default, is equivelent to signed char, which has a range of [-128..127]. However three of the values in VerifyCheck are outside of this range (0xC1, 0x83, and 0x9E; 193, 131, and 158 respectively).

     

    You can prevent the warning by making VerifyCheck an unsigned char[] instead of a char[], e.g.

    Code Snippet

    unsigned char VerifyCheck[] = { 0xC1, 0x83, 0x2A, 0x9E };

     

     

    Alternatively, you can leave VerifyCheck as a char[] and postfix each number with a u to indicate to the compiler that the number is intended to be unsigned, and it will silently do the conversion to (signed) char for you, e.g.

    Code Snippet

    char VerifyCheck[] = { 0xC1u, 0x83u, 0x2Au, 0x9Eu };

     

     

    See http://msdn2.microsoft.com/en-us/library/00a1awxf.aspx for more information about the 'u' suffix.
    Tuesday, April 15, 2008 10:04 PM
  • Well as it turns out, I cannot use that code given, unless i do a bit of recoding. Here is how i worked out my verification:

     

    Code Snippet

    int VerifyCheck = 2786241; // { 0xC1, 0x83, 0x2A, 0x9E } in int form

    char Verify[4] = {};

     

    inFile.get(Verify, 4); //to verify that we are editing an unreal package

    if(CharsToPrimitive(Verify) != VerifyCheck)

    return 1;

     

     

    Here is an article on what im trying to affect: http://wiki.beyondunreal.com/LegacyStick out tongueackage_File_Format

    Section 1.3.1 (Name-Table) is the target to be modified, and the header (1.2) contains where the NT resides and how long it is.

     

    The project I am working on has to do with repairing a obfuscated nametable, where people try to hide the inner workings of a package. Basically, instead of using normal printed characters, they use non-printable ones, (show up as squares), and thus when viewed in a package analyser, the viewer sees a bunch of squares. My goal is to replace said unprintable strings with incremented printable strings (V00, V01, V02 etc), making the target file more readable.

     

    Smile another question, apparently seekg() is overloaded, a absolute version and a relative version. seekg(5) would be the absolute version, and seekg(5, dir) would be the relative version. One thing i don't know is what are valid "dir" entries?

    Wednesday, April 16, 2008 4:51 AM

  • Quote>The project I am working on has to do with repairing a obfuscated
    Quote>nametable, where people try to hide the inner workings of a package.

     

    Sounds to me like you're trying to crack someone's file encryption.
    If so, do you have a legitimate reason for doing this?

     

    (Just curious/suspicious. Of course, you can invoke your Fifth Amendment

    rights and decline to answer.)

     

    - Wayne


     

    Wednesday, April 16, 2008 7:16 AM
  • Well, I had a huge answer written out, clicked send, and realised my wireless connection had died and lost all my post.

    Ill make it short.

     

    The file is not fully encrypted, only that the table containing the variable names is changed, byte-hacked I guess you could say, but the coder of the package. I am not modifying the programmers code itself, as every Unreal package holds this table. Nor will I use what I've done to modify the package. I will not be taking the coders source, then using it in my own projects, as I just with to learn the methods employed by said coder. This program in its completed form will not be publically released. The main goal of this project is not to de-obfuscate a package's table, but actually to begin learning C++, and by extension, learning to code better in UnrealScript. As a fact, I was planning on doing a project the reverse of this as well. Although there already is a program out that does this, I want the experience.

     

    All that being said, I plead the 5th Smile

    Wednesday, April 16, 2008 3:12 PM
  •  EscapingTheSilence wrote:

    Well as it turns out, I cannot use that code given, unless i do a bit of recoding. Here is how i worked out my verification:

     

    Code Snippet

    int VerifyCheck = 2786241; // { 0xC1, 0x83, 0x2A, 0x9E } in int form

    char Verify[4] = {};

     

    inFile.get(Verify, 4); //to verify that we are editing an unreal package

    if(CharsToPrimitive(Verify) != VerifyCheck)

    return 1;

     

     

     

    2786241 is equivelent to { 0xC1, 0x83, 0x2A }. 2653586369 is equivelent to { 0xC1, 0x83, 0x2A, 0x9E }.

     

     EscapingTheSilence wrote:

    another question, apparently seekg() is overloaded, a absolute version and a relative version. seekg(5) would be the absolute version, and seekg(5, dir) would be the relative version. One thing i don't know is what are valid "dir" entries?

     

    std::ios_base::beg, std::ios_base::cur, and std::ios_base::end. http://msdn2.microsoft.com/en-us/library/8dk8h81e.aspx

    Wednesday, April 16, 2008 4:30 PM
  • Smile Thanks ildjarn. Thanks for putting up with my noobishness Smile

    Question Directly to you, could i get your MSN? If you wish to keep your privacy, no problemo.

    I actually worked out a hack that I've been using,

    Code Snippet
    seekg(inFile.tellg()+5);

     

    Hackish but now that I know how really to do it, its all good.

     

    On a side note: the snippet code seems to be buggy, always changing my font and color :S Also doesn't keep tabbed spacing in code.

    Wednesday, April 16, 2008 4:51 PM
  •  EscapingTheSilence wrote:

    Question Directly to you, could i get your MSN? If you wish to keep your privacy, no problemo.

     

    I don't mind giving it to you, but I don't know how to get it to you without posting it here and giving it to everyone else as well. Wink

     

     EscapingTheSilence wrote:

    On a side note: the snippet code seems to be buggy, always changing my font and color Tongue Tied Also doesn't keep tabbed spacing in code.

     

    Yes, it doesn't like tabs much; I convert all tabs to spaces before pasting code here to retain indentation. This can be done in VS by selecting the relevant code and going to Edit -> Advanced -> Untabify Selected Lines. Or, Find & Replace always works too Stick out tongue

    Wednesday, April 16, 2008 5:05 PM