none
Convert Collection of strings to a string and the opposite

    Question

  • Hello,

     

    I need to convert a collection of string to one string, and later convert it back to a collection. the strings may contain any character so if for example I use comma as delimiter I have to support a value with comma(s). The parsing of the big string must be as fast as possible.

     

    Is there a way to do it?

     

    Thanks.

     

    Thursday, May 24, 2007 4:24 PM

Answers

  • You can use the same method as in communication systems where you want to separate binary telegrams, which may contain all byte values from 0 to 255.

     

    1) Select a value, which is not used so often, as delimiter.

    2) Replace all instances of this value in the strings with two delimiters, that is, double the delimiter.

    3) Terminate the string with one delimiter or use one delimiter as start flag for each string.

     

    It is now easy to split the strings again.

    A single instance of the delimiter is either a termination flag or a start flag whatever you choose.

    A double instance of the delimiter is a part of the string and it should just be replaced with a single instance.

     

    If the data part of the string may contain the delimiter both as first and last character, it is necessary to insert another known character in front of the flag, so that it is possible to decide to which string the characters belong. For example, if the delimiter character is D, the strings AAAD & DAAA cannot be converted back if they are just replaced with AAADD D DDAAA. The result could be AAA DDAAA, AAAD DAAA or AAADD AAA.

     

    You can also select a delimiter character and then always insert a character after this, which tells whether the delimiter is a part af the string or a delimiter like this: AAAD & DAAA => AAAD0 D1 D0AAA where D1 indicate a delimiter and D0 is a single D.

     

     

     

     

    Friday, May 25, 2007 10:34 AM

All replies

  • by Collection do u mean Array?

     

    Thursday, May 24, 2007 4:56 PM
  • You can use string.split to break out the string from the comma and string.join to recompile it with a comma.
    Thursday, May 24, 2007 5:04 PM
    Moderator
  • By 'collection' I mean a ICollection<string> or string[].

    and the split and join are no good because they require delimiter, and as I said the delimiter can exist inside a string value.

     

     

    Thursday, May 24, 2007 5:14 PM
  •  Ori' wrote:

    By 'collection' I mean a ICollection<string> or string[].

    and the split and join are no good because they require delimiter, and as I said the delimiter can exist inside a string value.



    Of course the delimiter is within the string value, that is what split works off of...what are we missing?
    Thursday, May 24, 2007 5:30 PM
    Moderator
  • Hi, if there really is not a single deliminator you can use, then another approach would be to create a string that is of the format (you would need to define a max length value for the number of substr and individual string length values so you know how to parse them:

     

    <NumberOfSubStrings><LengthOfSubStr1><LengthOfSubStr2>...<LengthOfSubStrN>substr1substr2...

     

    e.g.

    0 -> hello

    1 -> how

    2 -> are

    3 -> you

     

    Would become (max 999 substr, max 999 lenght sub strings since using 000 length numbers)

    "004005003003003hellohowareyou"

        -----|-----|----|-----|-----|

     

    Now you can parse the first 3 characters to give you the number of sub strings the second 3 chars given you the length of the first substring and so on.

     

    Mark.

    Thursday, May 24, 2007 7:14 PM
  • if I understand it correctly...

    You can start with the list of strings
    one
    two
    three,four
    five

    combine them into one string, if using comma as the delimiter
    one,two,three,four,five

    now when you split on the comma again (to return to the list of string)
    one
    two
    three
    four
    five

    I don't think you can reasonably avoid using a delimiter, the trick will be finding a delimiter that won't/can't exist in your string. Good candidates would be '\0' (null) or a character beyond simple English in ascii/unicode. '¸' (cedilla), '¬' (not sign) or if you have an ascii source find a code that isn't displayable/typeable (wiki ascii codes, ascii table). You have to have something delimit entries when they are combined otherwise you loose the information about which entry is which.

    The only other way to maintain what can be separated is to store the lengths of each element.

    You might need to define "any character" better or explain why a string you have would contain any possible character (including control characters like null and such). Is there any encryption being performed on the strings before you need to combine them?
    Thursday, May 24, 2007 7:17 PM
  • I know I can use \0 or some other char, but I don't know my input in advance, and I prefer not to define a delimiter.

    In addition I want the collection string to be readable - I really prefer using ',' as a delimiter.

    I need something like escape character, or maybe add quotes to values with ','

     

    I just  want to know if there is something ready for usage

    Thursday, May 24, 2007 7:38 PM
  • The problem with escape characters, is that they two could come inside an unknown input string. \0 isn't the only control character that isn't typeable (if this is user input), others are more or less string readable. You can use control character like \n, \r, \f, the paragraph character, esc character.

    At some point you'll have to make some sort of assumption on your input or do something to convert it into a standardized format. TCP/IP packets actually have to do that with some data so that the data won't break the TCP/IP packets. TCP/IP packets use a control character escape character, some data might contain that escape character, so the first thing it can do is to double the control character (the same way you need to double up '\' to put it into a string in code). After that there are still certain data that can be problematic, this data gets converted into multiple packets with a specific format to avoid breaking the TCP/IP packet.
    Thursday, May 24, 2007 8:20 PM
  •  Ori' wrote:

    I know I can use \0 or some other char, but I don't know my input in advance, and I prefer not to define a delimiter.

    In addition I want the collection string to be readable - I really prefer using ',' as a delimiter.

    I need something like escape character, or maybe add quotes to values with ','

    I just want to know if there is something ready for usage



    If you are the one to define the delimiter, I have used characters which are viewable but not likely to be used by 90% of the public. For example pull up the program charmap.exe (found in the System32 directory), I always slide down to the plus minus sign

    ±

    Which as you can see is readable (unicode B1) but only used very rarely. Then you can split off of that character to unserialize and the join to serialize. Or choose one of your own.



    Another option is to use regular expressions and split off of the \r\n characters....
    Thursday, May 24, 2007 9:38 PM
    Moderator
  • Thanks for all the characters proposals but I cannot use a special char

    Friday, May 25, 2007 5:23 AM
  • Chr(160) looks just like a space but

    I don't think a user can input it without

    trying real hard.

     

    2nd idea. A 'list(of string)' is easy to break apart

    and rebuild.

    Friday, May 25, 2007 5:46 AM
  • If you can't find/use any character for delimiter, which I doubt, then you can use xml. Add DataSet with one DataTable with one string Column. You will fill rows in datatable with items and using GetXML will get string representation and using ReadXML you can fill dataset from supplied xml string.
    There is also posibility to use byte array "byte[]". For this you need first to store items count as 4 byte int number, and then for every string item to store it's length as 4 byte int and then string value. This method is used for transfering data trough socket connection.
    Friday, May 25, 2007 7:46 AM
  • you can use another array of integers that keeps the lenghts of substrings or indexes of delimiters , or something like that

    its just an idea...........

    Friday, May 25, 2007 9:49 AM
  • You can use the same method as in communication systems where you want to separate binary telegrams, which may contain all byte values from 0 to 255.

     

    1) Select a value, which is not used so often, as delimiter.

    2) Replace all instances of this value in the strings with two delimiters, that is, double the delimiter.

    3) Terminate the string with one delimiter or use one delimiter as start flag for each string.

     

    It is now easy to split the strings again.

    A single instance of the delimiter is either a termination flag or a start flag whatever you choose.

    A double instance of the delimiter is a part of the string and it should just be replaced with a single instance.

     

    If the data part of the string may contain the delimiter both as first and last character, it is necessary to insert another known character in front of the flag, so that it is possible to decide to which string the characters belong. For example, if the delimiter character is D, the strings AAAD & DAAA cannot be converted back if they are just replaced with AAADD D DDAAA. The result could be AAA DDAAA, AAAD DAAA or AAADD AAA.

     

    You can also select a delimiter character and then always insert a character after this, which tells whether the delimiter is a part af the string or a delimiter like this: AAAD & DAAA => AAAD0 D1 D0AAA where D1 indicate a delimiter and D0 is a single D.

     

     

     

     

    Friday, May 25, 2007 10:34 AM
  • Here is my very simple c# code. As you can see when there is a ',' it doesn't work.

    Code Snippet

    static string CollectionToString(ICollection<string> col)

    {

      StringBuilder result = new StringBuilder();

      foreach (string str in col)

      {

        result.Append(str);

        result.Append(',');

      }

      if (result.Length > 0)

        result.Remove(result.Length - 1, 1);

      return result.ToString();

    }

    static ICollection<string> StringToCollection(string str)

    {

      ICollection<string> result = new List<String>();

      foreach (string value in str.Split(new char[] { ',' },

               StringSplitOptions.RemoveEmptyEntries))

        result.Add(value);

      return result;

    }

     I already have a solution: I check if the value contains a ',' and if so I add quotes and remove the quotes in the second function. Carsten's solution (last post) also looks good. Any more ideas?

    Friday, May 25, 2007 11:11 AM