locked
Byte array to string RRS feed

  • Question

  • Hello,
    I receive a byte array and have to convert it into a string.
    I only need one character in my string.
    How can I reach my goal?
    resToClient = (char)2 + "PING" + (char)4 + "WORK" + (char)4 + "ACK" + (char)3;
    
    //Current Output:
    Encoding.UTF8.GetString(buffer, 0, sizeReceive)	"\u0002PING\u0004WORK\u0003"	
    Encoding.ASCII.GetString(new byte[]{ 2 });	"\u0002"	string
    
    
    //Should be:
         For ASCII 2 (STX)  3 (ETX)  4 (EOT) I need only one character inside my string not more.

    Thanks in advance for your help.

    Greetings Markus.


    My attempts were.
    var buffer = new byte[10000];
    
    sizeReceive = tcpClient.Client.Receive(buffer, 0, buffer.Length, SocketFlags.None);
    
    receiveBuffer += Encoding.UTF8.GetString(buffer, 0, sizeReceive).Replace('\u0002', '2').Replace('\u0003', '3').Replace('\u0004', '4');
    
    
    byte[] chars = new byte[sizeReceive];
    System.Buffer.BlockCopy(buffer, 0, chars, 0, sizeReceive);
    //receiveBuffer +=  new string(chars);
    
    foreach (var item in chars)
    {
    	if (item < 30)
    	{
    		int t = (char)item;
    		receiveBuffer += (char)t;
    	}
    	else
    		receiveBuffer += Convert.ToChar(item);
    }


    Monday, April 20, 2020 4:05 PM

Answers

  • Hi markus,

    For the example string, we can split it like this:

       var result =  str.Split(new char[] { (char)4 });

    Or

     var result2 = str.Split(new string[] { "\u0004" },StringSplitOptions.None);

    Even if it is a byte array, we can see where the character EOT(4) is.

    We can split it into 3 new byte arrays based on the index, and then do something.

    I hope I did not misunderstand you again.

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, April 22, 2020 5:47 AM
  • Hi Markus,

    In the past, I've done something like this for processing data coming in from a TCP Listener (it sounds like that is what you're doing, or something similar):
    private string ValidStart = new string(new char[] { (char)2 });
    private string ValidEnd   = new string(new char[] { (char)3 });
    
    public List<string> ProcessDataReceived(ref string buffer, string dataReceived)
    {
        List<string> DataList = new List<string>();
        try
        {
            //add received data to the buffer
            buffer += dataReceived;
    
            int whileLoopWatchdog = 0;
            while (buffer.Length > 0)
            {
                this.RemoveDataPrecedingStart(ref buffer);
    
                //verify that a complete message is present, if so process just that
                string Complete = this.ProcessCompleteMessage(ref buffer);
                if (Complete.IsNotNullOrEmpty())
                {
                    //string Xml;
                    //if (this.IncludeDelimiters)
                    //    Xml = Complete;
                    //else
                    //    Xml = this.ParseOutDelimiters(Complete); 
                    //
                    //DataList.Add(Xml);
    
                    DataList.Add(Complete);
    .
                    //remove the data that was pulled out for processing
                    //this will replace multiple identical occurrences but that
                    //is fine
                    buffer = buffer.Replace(Complete, "");
                }
    
                //HACK: prevent endless while loop
                //(although 20+ complete sets of data at once probably wouldn't
                //happen, and if it did the remainder would be caught on
                //next data receive)
                whileLoopWatchdog++;
                if (whileLoopWatchdog > 20)
                {
                    whileLoopWatchdog = 0;
                    break;
                }
            }
        }
        catch (Exception ex)
        {
            LogOutput.WriteLine(ex, "DataParser Exception");
        }
        return DataList;
    }
    protected virtual string ProcessCompleteMessage(ref string buffer)
    {
        string Complete = "";
    
        if ((buffer.StartsWith(ValidStart)) && (buffer.Contains(ValidEnd)) && (buffer.IndexOf(ValidStart) < buffer.IndexOf(ValidEnd)))
        {
            Complete = buffer.Substring(buffer.IndexOf(ValidStart),
                                        buffer.IndexOf(ValidEnd) + ValidEnd.Length - buffer.IndexOf(ValidStart));
        }
    
        return Complete;
    }
    


    Hope that helps!  =0)

    ~~Bonnie DeWitt [C# MVP]

    http://geek-goddess-bonnie.blogspot.com

    • Marked as answer by Markus Freitag Saturday, April 25, 2020 10:51 AM
    Wednesday, April 22, 2020 3:04 PM

  • But how would you split the byte array?
    It's an illusion, like Tim said.

    I'm confused by your confusion. 8=O

    What's an illusion?

    The character shown as \u0004 (or \u0002 or \u0003) is a single byte, as Tim
    said. Since it has no visual symbol to represent it in a display of characters
    such as a string, it is shown as its escaped Unicode value. You can view it as
    a hex value as well: 0x4 or 0x0004 etc. Or as a decimal value, as Timon said
    and illustrated.

    As to splitting the byte array, you haven't given enough details. What exactly,
    if anything, is supposed to be done with the STX and ETX characters? Are they
    to be stripped off? Or left attached to the substrings?

    Or is the parsing to be dome only on the text bounded by the STX and ETX 
    characters? Given the string of bytes in your example, what *exactly* should
    the substrings (of bytes) contain?

    Are the substrings to be in byte arrays? An array of byte arrays? An array of
    strings?

    Finding the EOT in a byte array can be done like so:

    byte[] bar = { (byte)'\u0002', (byte)'P', (byte)'I', (byte)'N', (byte)'G',
     (byte)'\u0004', (byte)'W', (byte)'O', (byte)'R', (byte)'K', (byte)'\u0003'};
    
    int idx4 = Array.IndexOf(bar, (byte)0x4); // here idx4 == 5
    
    

    With some length calculations for array sizing, you can selectively copy
    parts of the source byte array into other byte arrays. e.g. -

    // copy from start of source to EOT - 1
    Array.Copy(bar, 0, bar1, 0, idx4);
    
    // copy from EOT + 1 to end of source
    Array.Copy(bar, idx4 + 1, bar2, 0, bar.Length - idx4 - 1);
    
    

    E&OE

    - Wayne

    • Marked as answer by Markus Freitag Saturday, April 25, 2020 10:51 AM
    Wednesday, April 22, 2020 11:44 PM

  • ok now   -- array



    You can represent *any* character by using the same format as the above examples
    are using for STX, ETX, EOT. For example:

    byte[] ba_ex = { (byte)'\u0002', (byte)'\u0050', (byte)'\u0049', (byte)'\u004e', (byte)'\u0047',
     (byte)'\u0004', (byte)'\u0057', (byte)'\u004f', (byte)'\u0052', (byte)'\u004b', (byte)'\u0003'};
    
    string strx = Encoding.UTF8.GetString(ba_ex);
    

    Guess what strx contains?

    See:

    char (C# reference)
    https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/char

    Excerpt:

    "Literals

    You can specify a char value with:

      -  a character literal.
      -  a Unicode escape sequence, which is \u followed by the four-symbol
         hexadecimal representation of a character code.
      -  a hexadecimal escape sequence, which is \x followed by the hexadecimal
         representation of a character code."

    If you open the Character Map utility and hover over any symbol in the font
    selected you will see the Unicode value for that symbol/character.

    Example:

    - Wayne

    • Marked as answer by Markus Freitag Saturday, April 25, 2020 10:51 AM
    Thursday, April 23, 2020 6:11 PM

All replies

  • The string "\u0002" contains exactly one character, with the Unicode value U+0002.

    I suspect you are being fooled by the debugger, which is trying to show you special characters that cannot themselves be printed.  It's exactly like the string "\r\n", which contains exactly two characters: return and linefeed.

    If you are dealing with ASCII characters and special characters, then you are likely to confuse yourself by converting back and forth to Unicode.  Why not just leave it as a byte array?


    Tim Roberts | Driver MVP Emeritus | Providenza &amp; Boekelheide, Inc.

    Tuesday, April 21, 2020 5:35 AM
  • Hi Markus,

    Thank you for posting here.

    Those characters: STX, ETX, EOT, etc. are called Control characters, they also have a name: Non-printable character.

    We can't see that as "STX" in the string. If you want to see it, I am afraid you'd better type these letters manually.

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, April 21, 2020 7:51 AM
  •   Why not just leave it as a byte array?



    I receive a string and have to analyze the elements.

    If I have a string, I can use split, that's why I converted.

    resToClient = (char)2 + "PING" + (char)4 + "WORK" + (char)4 + "ACK" + (char)3;

    If I create it like this I have exactly 1 character in the string

    "\u0002" here I have 5, I can't split with (char)4

          
     2 P I N G 4
     0 1 2 3 4 5
       re  string byte

    @Tim, Can you make a sample, how Do you split it with byte array?

    •    Start is STX
    •    End is ETX
    •    EOT is the separator

    Thanks in advance.

    Greetings Markus


    Tuesday, April 21, 2020 10:33 AM
  • Hi markus,

    For the example string, we can split it like this:

       var result =  str.Split(new char[] { (char)4 });

    Or

     var result2 = str.Split(new string[] { "\u0004" },StringSplitOptions.None);

    Even if it is a byte array, we can see where the character EOT(4) is.

    We can split it into 3 new byte arrays based on the index, and then do something.

    I hope I did not misunderstand you again.

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, April 22, 2020 5:47 AM
  • Hi Markus,

    In the past, I've done something like this for processing data coming in from a TCP Listener (it sounds like that is what you're doing, or something similar):
    private string ValidStart = new string(new char[] { (char)2 });
    private string ValidEnd   = new string(new char[] { (char)3 });
    
    public List<string> ProcessDataReceived(ref string buffer, string dataReceived)
    {
        List<string> DataList = new List<string>();
        try
        {
            //add received data to the buffer
            buffer += dataReceived;
    
            int whileLoopWatchdog = 0;
            while (buffer.Length > 0)
            {
                this.RemoveDataPrecedingStart(ref buffer);
    
                //verify that a complete message is present, if so process just that
                string Complete = this.ProcessCompleteMessage(ref buffer);
                if (Complete.IsNotNullOrEmpty())
                {
                    //string Xml;
                    //if (this.IncludeDelimiters)
                    //    Xml = Complete;
                    //else
                    //    Xml = this.ParseOutDelimiters(Complete); 
                    //
                    //DataList.Add(Xml);
    
                    DataList.Add(Complete);
    .
                    //remove the data that was pulled out for processing
                    //this will replace multiple identical occurrences but that
                    //is fine
                    buffer = buffer.Replace(Complete, "");
                }
    
                //HACK: prevent endless while loop
                //(although 20+ complete sets of data at once probably wouldn't
                //happen, and if it did the remainder would be caught on
                //next data receive)
                whileLoopWatchdog++;
                if (whileLoopWatchdog > 20)
                {
                    whileLoopWatchdog = 0;
                    break;
                }
            }
        }
        catch (Exception ex)
        {
            LogOutput.WriteLine(ex, "DataParser Exception");
        }
        return DataList;
    }
    protected virtual string ProcessCompleteMessage(ref string buffer)
    {
        string Complete = "";
    
        if ((buffer.StartsWith(ValidStart)) && (buffer.Contains(ValidEnd)) && (buffer.IndexOf(ValidStart) < buffer.IndexOf(ValidEnd)))
        {
            Complete = buffer.Substring(buffer.IndexOf(ValidStart),
                                        buffer.IndexOf(ValidEnd) + ValidEnd.Length - buffer.IndexOf(ValidStart));
        }
    
        return Complete;
    }
    


    Hope that helps!  =0)

    ~~Bonnie DeWitt [C# MVP]

    http://geek-goddess-bonnie.blogspot.com

    • Marked as answer by Markus Freitag Saturday, April 25, 2020 10:51 AM
    Wednesday, April 22, 2020 3:04 PM
  • Hello Timon,
    OK, that's the same.
    var result =  str.Split(new char[] { (char)4 });
    var result2 = str.Split(new string[] { "\u0004" },StringSplitOptions.None);
    Is clear.
    >We can split it into 3 new byte arrays based on the index, and then do something.
    >I hope I did not misunderstand you again.
    I think you understand what I need.
    But how would you split the byte array?
    It's an illusion, like Tim said.ok now

    Greetings Markus

    Wednesday, April 22, 2020 3:50 PM

  • But how would you split the byte array?
    It's an illusion, like Tim said.

    I'm confused by your confusion. 8=O

    What's an illusion?

    The character shown as \u0004 (or \u0002 or \u0003) is a single byte, as Tim
    said. Since it has no visual symbol to represent it in a display of characters
    such as a string, it is shown as its escaped Unicode value. You can view it as
    a hex value as well: 0x4 or 0x0004 etc. Or as a decimal value, as Timon said
    and illustrated.

    As to splitting the byte array, you haven't given enough details. What exactly,
    if anything, is supposed to be done with the STX and ETX characters? Are they
    to be stripped off? Or left attached to the substrings?

    Or is the parsing to be dome only on the text bounded by the STX and ETX 
    characters? Given the string of bytes in your example, what *exactly* should
    the substrings (of bytes) contain?

    Are the substrings to be in byte arrays? An array of byte arrays? An array of
    strings?

    Finding the EOT in a byte array can be done like so:

    byte[] bar = { (byte)'\u0002', (byte)'P', (byte)'I', (byte)'N', (byte)'G',
     (byte)'\u0004', (byte)'W', (byte)'O', (byte)'R', (byte)'K', (byte)'\u0003'};
    
    int idx4 = Array.IndexOf(bar, (byte)0x4); // here idx4 == 5
    
    

    With some length calculations for array sizing, you can selectively copy
    parts of the source byte array into other byte arrays. e.g. -

    // copy from start of source to EOT - 1
    Array.Copy(bar, 0, bar1, 0, idx4);
    
    // copy from EOT + 1 to end of source
    Array.Copy(bar, idx4 + 1, bar2, 0, bar.Length - idx4 - 1);
    
    

    E&OE

    - Wayne

    • Marked as answer by Markus Freitag Saturday, April 25, 2020 10:51 AM
    Wednesday, April 22, 2020 11:44 PM
  • Dear All,
    First, Thank you all for the responses.

    >I'm confused by your confusion. 8=O
    What do you mean by that? Oh, my God ;-)

    ok now   -- array


    >Are the substrings to be in byte arrays?
    >An array of byte arrays?
    >An array of strings?
    It's clear to me now.
    The array can be small or large.
    <STX>Group<EOT>Element1<EOT>Element2<EOT>Element2   .... <EOT>ElementN<ETX>
    Yes I can parse the array with IndexOf.
                  --> Without converting to a string.
    Do you have another good tip or example?
    If yes very nice, I think I should be able to solve this, want to have a good structure.

    Greetings Markus
    Thursday, April 23, 2020 5:22 PM

  • ok now   -- array



    You can represent *any* character by using the same format as the above examples
    are using for STX, ETX, EOT. For example:

    byte[] ba_ex = { (byte)'\u0002', (byte)'\u0050', (byte)'\u0049', (byte)'\u004e', (byte)'\u0047',
     (byte)'\u0004', (byte)'\u0057', (byte)'\u004f', (byte)'\u0052', (byte)'\u004b', (byte)'\u0003'};
    
    string strx = Encoding.UTF8.GetString(ba_ex);
    

    Guess what strx contains?

    See:

    char (C# reference)
    https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/char

    Excerpt:

    "Literals

    You can specify a char value with:

      -  a character literal.
      -  a Unicode escape sequence, which is \u followed by the four-symbol
         hexadecimal representation of a character code.
      -  a hexadecimal escape sequence, which is \x followed by the hexadecimal
         representation of a character code."

    If you open the Character Map utility and hover over any symbol in the font
    selected you will see the Unicode value for that symbol/character.

    Example:

    - Wayne

    • Marked as answer by Markus Freitag Saturday, April 25, 2020 10:51 AM
    Thursday, April 23, 2020 6:11 PM