none
binary reader is it possible to select certin parts of hex and end at certins parts RRS feed

  • Question

  • ok what i am trying to do is get this text out of a binary file i know its shown as plain text but i know its not

    as you can see here by the picture 

    Revrence image

    now each file im going to read in is diffent the only thing that seems to stay the same thoughout all files is colour map spec normal witch are the following hex values 

    coulorMap hex code = 63 6F 6C 6F 72 4D 61 70
    normalmap hex code = 6E 6F 72 6D 61 6C 4D 61 70

    spec map hex code = 73 70 65 63 75 6C 61 72 4D 61 70

    is there a way to get the text bytes just between these values as you can see by the highlighted code in revrence picture i want to get those bytes and have noticed all of the values are seprated via . 

    the only diffence is that the colour map you have to go to the left up the rest are right across i have made this small program currently that starts at certin byte but each file is diffent so was wondring is there a way to scope the file and stop when it reaches like colour map hex if those bytes match stop looking 

            public void findmaterialname()
            {
                BinaryReader br = new BinaryReader(File.OpenRead(waw_path + @"\raw\materials\" + mattofind));
                br.BaseStream.Position = 0xB2;
                foreach (char mychar in br.ReadChars(43))
                {
                    storedbin += mychar;
                }
    
                MessageBox.Show(storedbin);
                storedbin = "";
            }

    because some values are less than 43 chars and some are more so needed to ask advice i hope i making sense sorry i have dyliexa so its hard to expalin sometimes. 

    thanks in advance elfenliedtopfan5


    Wednesday, June 5, 2019 2:42 PM

All replies

  • Maybe you can transform this binary file to a string that does not contain the non-text bytes, which are difficult to parse. You can transform them to spaces. For example:

     

    string text = string.Concat(

        File

            .ReadAllBytes( waw_path + ..... )

            .Select( b => (char)b )

            .Select( c => c >= 0x20 && c <= 0x7F ? c : ' ' ) );

     

    Now this text can be parsed with regular expressions or other string functions.

    Wednesday, June 5, 2019 4:39 PM
  • You're trying to apply some heuristics to a file format that may not be true. Instead of doing that why don't you get the actual file format that the file is stored in and then use that to build out the reading code? Most binary formats start with a header. That header tends to indicate either where the real data resides or at least how big the data is. If the header is fixed size then you can skip it otherwise you have to read the header data. Then you can work on the next set of data. In the example files you looked at they may be the same size but there are no guarantees. Look at the file format itself. Note that there is absolutely no difference between text and bytes to the computer. 0x40 is a byte value. Whether you translate it as 'A' or the numeric value 64 (or part of a larger value) is completely up to the code reading it. There is no magically approach that says 0x40 is always a letter. Hence your code cannot rely on that either. It'll be wrong from some files.

    Michael Taylor http://www.michaeltaylorp3.net

    Wednesday, June 5, 2019 5:28 PM
    Moderator
  • The 160-byte header of this file includes pointers to the things you need, followed by a list of zero-terminated strings.  This would be easy in C++, but it's a bit more tedious in C#.  Notice that the dword at 0x00 points to the very first string, the dword at 0x04 points to the second string, the dword at 0x40 points to the "colorMap" string, the dword at 0x4C points to the "normalMap" string, and the dword at 0x58 points to the "specularMap" string.  All of the strings are just normal, 0-terminated strings.  (They aren't separated by ".", they are separated by zeros.)

    So, this should get you a big head start:

    using System;
    using System.Collections.Generic;
    using System.Text;
    using System.IO;
    
    class Program
    {
        static void Main(string[] args)
        {
            BinaryReader br = new BinaryReader(File.OpenRead(args[0]));
            int firstString = br.ReadInt32();
            int secondString = br.ReadInt32();
            br.BaseStream.Position = firstString;
    
            List<string> strings = new List<string>();
            string s = "";
            while( br.BaseStream.Position < br.BaseStream.Length )
            {
                char c = br.ReadChar();
                if( c == '\0' )
                {
                    strings.Add( s );
                    s = "";
                }
                else
                {
                    s += c;
                }
            }
    
            foreach( var item in strings )
                Console.WriteLine( item );
        }
    }

    By the way, when you see "00 00 80 3f", that's a floating point 1.0.


    Tim Roberts | Driver MVP Emeritus | Providenza &amp; Boekelheide, Inc.

    Wednesday, June 5, 2019 11:07 PM

  •  i want to get those bytes and have noticed all of the values are seprated via . 


    No. A period is used by the editor/viewer in the display to show where there is
    a non-printable character. Look at the hex values corresponding to where the
    periods appear. Most often it is a binary zero, but it may also be some other
    value such as 0x12, 0x0C, 0x01 etc.

    - Wayne

    Thursday, June 6, 2019 12:30 AM

  • now each file im going to read in is diffent the only thing that seems to stay the same thoughout all files is colour map spec normal witch are the following hex values 

    coulorMap [sic] hex code = 63 6F 6C 6F 72 4D 61 70
    normalmap hex code = 6E 6F 72 6D 61 6C 4D 61 70

    spec map hex code = 73 70 65 63 75 6C 61 72 4D 61 70


    I'm not sure what you are saying there, or what you are thinking the values
    represent. In the file viewer the hex values on the left and the "Decoded text"
    values on the right are displaying the *same* data. They are two different views
    of the same bytes.

    So of course

    colorMap hex code = 63 6F 6C 6F 72 4D 61 70

    because 0x63 is the letter 'c', 0x6F is the letter 'o', 0x6C is the letter 'l',
    etc. The same for the other strings for which you showed hex values. Look at
    the Character Map utility to see the hex values for characters.

    - Wayne

    Thursday, June 6, 2019 12:55 AM
  • Hi

    Is your problem solved? If so, please post "Mark as answer" to the appropriate answer, so that it will help other members to find the solution quickly if they face a similar issue.

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, June 10, 2019 8:27 AM
    Moderator