locked
Most Efficient Method to Get Index of a Character Only if it Follows a Particular string RRS feed

  • Question

  • User-1231523685 posted

    Suppose I have the following string:

    "[name=Test1, location=L1, duration=3W]; [name=Test2, location=L4, duration=1W]; [name=Test5, location=L1, duration=2W]"

    Data in each square bracket represent a record and each record is separated from the next by a semicolon.

    Using C#, what is the most efficient way to get the index of the letter "d" in the word duration for the Test2 record?

    Sunday, May 13, 2018 11:05 AM

Answers

  • User303363814 posted

    I don't quite understand what you are trying to do but would suggest that wanting to know the index of 'd' is the wrong way to go about it.  It will be better to have a 'parser' of some form (as you note, these are available) which converts the input string to 'data' and then other routines which can convert that 'data' to the output format that you want.

    Each 'record' seems to be a set of name/value pairs.  A good way to represent name/value pairs in .Net is a Dictionary - the name becomes the index.

    A series of records would commonly be represented as some sort of enumerable data type. 

    A basic 'parser' could be written in a single statement if you want to

    var data = input.Split(';') \\ Divide the records
                   .Select(i => i.Replace('[',' ')
                                 .Replace(']',' ')
                                 .Trim()) \\ Remove the noise characters
                   .Select(d => d.Split(',') \\ Divide each record into the fields
                   .Select(e => e.Trim() \\ Remove the noise
                                 .Split('=')) \\ Splits the name from the value
                                 .ToDictionary(e => e[0], e=>e[1]));  \\ Make the dictionary

    Now you have an Enumerable where each element represents one record.  Each record is a Dictionary with index being the key and value being the data.

    Now you have the data in a logical format you can convert it to whatever result you want to create.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Monday, May 14, 2018 12:21 AM

All replies

  • User475983607 posted

    Using C#, what is the most efficient way to get the index of the letter "d" in the word duration for the Test2 record?

    The index as compared to the start of the string?

            static void Main(string[] args)
            {
    
                string buff = @"[name=Test1, location=L1, duration=3W]; [name=Test2, location=L4, duration=1W]; [name=Test5, location=L1, duration=2W]";
                int i = GetIndexOfd(buff, "Test2");
                Console.WriteLine("Index: {0}", i);
                Console.WriteLine("Character: {0}", buff[i]);
            }
    
            private static int GetIndexOfd(string source, string pattern)
            {
                int head = source.IndexOf(pattern) + pattern.Length + 2;
                return  source.IndexOf(" ", head) + 1;
            }

    Sunday, May 13, 2018 12:24 PM
  • User-1231523685 posted

    Hi, thanks for your reply. I see that your solution is dependent on the fact that there is a repeating pattern in each record and that is there is exactly one white space between each property in any given record, exactly one white space between each record,  and each property appears in the same spot in each record.

    What if the number of white space between each property and each record is random like when the string is too long and it must wrap to the next line or if the user put too many white spaces in by accident? Also what if each property was put in a random location within the record by the user instead of always being in the same spot in the record, how can I compensate for that?

    Sunday, May 13, 2018 5:53 PM
  • User475983607 posted

    gapi555

    Hi, thanks for your reply. I see that your solution is dependent on the fact that there is a repeating pattern in each record and that is there is exactly one white space between each property in any given record, exactly one white space between each record,  and each property appears in the same spot in each record.

    What if the number of white space between each property and each record is random like when the string is too long and it must wrap to the next line or if the user put too many white spaces in by accident? Also what if each property was put in a random location within the record by the user instead of always being in the same spot in the record, how can I compensate for that?

    This was not part of the original requirement.  You should be able to take the code above and modify it according to the new white space requirements.

    Frankly, I would have approached the problem much differently and tokenized the stream/buffer rather than using strings.  Or use a standard serialized format.

    Sunday, May 13, 2018 6:22 PM
  • User753101303 posted

    Hi,

    Some more context could help. It looks like serialized data. You could perhaps consider using json and working on deserialized data rather than directly on the string format in which data are persisted. You won't need at all other records? It seems you are telling a user would directly enter that??

    Sunday, May 13, 2018 6:29 PM
  • User-1231523685 posted

    Hi, I am trying to see if I could convert XML, HTML, etc to JSON or something like it and back to their original format. I know there are tools out there for this but I just wanted to see how such tools could work behind the scene. So when you said users will be entering data in, you are right in so far as the syntax for HTML and XML are created by their authors. I would imagine the solution given above can give me a head start.

    Sunday, May 13, 2018 6:52 PM
  • User303363814 posted

    I don't quite understand what you are trying to do but would suggest that wanting to know the index of 'd' is the wrong way to go about it.  It will be better to have a 'parser' of some form (as you note, these are available) which converts the input string to 'data' and then other routines which can convert that 'data' to the output format that you want.

    Each 'record' seems to be a set of name/value pairs.  A good way to represent name/value pairs in .Net is a Dictionary - the name becomes the index.

    A series of records would commonly be represented as some sort of enumerable data type. 

    A basic 'parser' could be written in a single statement if you want to

    var data = input.Split(';') \\ Divide the records
                   .Select(i => i.Replace('[',' ')
                                 .Replace(']',' ')
                                 .Trim()) \\ Remove the noise characters
                   .Select(d => d.Split(',') \\ Divide each record into the fields
                   .Select(e => e.Trim() \\ Remove the noise
                                 .Split('=')) \\ Splits the name from the value
                                 .ToDictionary(e => e[0], e=>e[1]));  \\ Make the dictionary

    Now you have an Enumerable where each element represents one record.  Each record is a Dictionary with index being the key and value being the data.

    Now you have the data in a logical format you can convert it to whatever result you want to create.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Monday, May 14, 2018 12:21 AM
  • User753101303 posted

    As pointed already this kind of tool doesn't try to locate something but are based on parsers ie the string is analyzed character by character and keep tracks on what is currently analyzed (ie an identifier, a literal value etc).

    You do have support for handling XML and JSON. HTML is not about data so I'm not sure how it relates.

    If working with 3rd party it seems easier for everyone to stick to known formats such as CSV, XML or JSON rather than inventing again some other format. 

    Monday, May 14, 2018 7:32 AM
  • User-1231523685 posted

    Thank you all for your input. This is really helpful for understanding how parsers work behind the scene.

    Tuesday, May 15, 2018 11:28 AM