none
Split the string to a structure RRS feed

  • Question

  • Hello,
    I need to structure this string.
    I see the signs = and | and ,
    REQUESTCOMMANDO|id=x|messageID=324|process=CONNECTION|station=4565332|question=serials
    
    BACKCOMMANDO|id=x|messageID=324|process=CONNECTION|station=4565332|software=V1.242|attributedata=bottomSideId,70704315500000,7070431550000001,7070431550000002,7070431550000003,7070431550000004,7070431550000005,7070431550000006,7070431550000007,7070431550000008,7070431550000009,7070431550000010|attributedata=topSideId,70704315500100,7070431550010001,7070431550010002,7070431550010003,7070431550010004,7070431550010005,7070431550010006,7070431550010007,7070431550010008,7070431550010009,7070431550010010|status=PASS
    
    How could I best parse, split this string?
    What's the best way to do it? With RegEx?
    What's the best course of action?
    Please note messageID=324  --> Can be inside or not
                               --> Increase for each message

    Could you please show me different ways or possibilities? Thank you for your help in advance.

    tip

    Many greetings Markus


    Thursday, December 12, 2019 6:02 PM

Answers

  • Here's a quick sample of how you might solve this problem. The bulk of the code is just your app logic around how to work with the parsed data and it can be as simple or as complex as you need. 

    class Program
    {
        static void Main(string[] args)
        {
            var data = "BACKCOMMANDO|id=x|process=CONNECTION|station=4565332|software=V1.242|attributedata=bottomSideId,1,2,3|attributedata=topSideId,4,5,6|status=PASS";
    
            var msg = ParseMessage(data);
        }
    
        //TODO: Consider putting this into a message parser class or even on your Message class directly if needed
        static Message ParseMessage ( string input )
        {
            if (String.IsNullOrEmpty(input))
                return null;
    
            //Break up into fields
            var fields = input.Split('|');
    
            //A proper message has at least a command
            var msg = new Message()
            {
                Command = fields[0].Trim()
            };
    
            //TODO: At this point you should probably validate you have a command you understand, if not abort...            
    
            //Split each subsequent field into key-value pairs
            var pairDelimiter = new[] { '=' };
            foreach (var field in fields.Skip(1))
            {
                //What field is this?
                var tokens = field.Split(pairDelimiter, 2);
    
                //Skip invalid fields, if this is even possible
                if (tokens.Length != 2)
                    continue;
    
                //Match the fields you care about
                switch (tokens[0].ToLower())
                {
                    case "id": msg.Id = tokens[1]; break;
                    case "process": msg.Process = tokens[1]; break;
                    case "station": msg.Station = tokens[1]; break;
                    case "software": msg.Software = tokens[1]; break;
                    case "status": msg.Status = tokens[1]; break;
    
                    case "attributedata": ParseAttributeData(msg.Data, tokens[1]); break;
                };
            };
    
            return msg;
        }        
    
        static void ParseAttributeData ( IDictionary<string, List<string>> attributes, string input )
        {
            if (String.IsNullOrEmpty(input))
                return;
    
            //Break up the attribute data
            var data = input.Split(',');
    
            //if there isn't at least 2 points then there is nothing to do
            if (data.Length < 2)
                return;
    
            //The first data point is the attribute name, see if we already have read that data
            var name = data[0];
    
            if (!attributes.TryGetValue(name, out var values))
            {
                //Create a new data point for this set of values
                attributes[name] = values = new List<string>();
            };
    
            //TODO: If the attribute already exists what do you want to do - reset, append, etc...
    
            //The remaining tokens is the data for this attribute
            values.AddRange(data.Skip(1));
        }
    }
    
    //Making the assumption that all messages are similar, just the data changes
    //If this isn't true then just use an appropriate structure to represent the data you want
    class Message
    {
        public string Command { get; set; }
    
        public string Id { get; set; }
    
        public string Process { get; set; }
    
        public string Station { get; set; }
    
        public string Software { get; set; }
    
        public string Status { get; set; }
    
        //Just making up a structure for the attribute data, build this out as you see fit based upon your requirements
        public IDictionary<string, List<string>> Data { get; } = new Dictionary<string, List<string>>(StringComparer.OrdinalIgnoreCase);
    }


    Michael Taylor http://www.michaeltaylorp3.net

    • Marked as answer by Markus Freitag Sunday, December 15, 2019 11:19 AM
    Saturday, December 14, 2019 7:55 PM
    Moderator

All replies

  • Just taking a guess here but it appears your high level data is separated by bars so split the string on that first using String.Split to produce the "fields" of the record. Note that the assumption is a vertical bar cannot appear inside the data otherwise. For each record then split based upon the = sign. Except the first entry (which you can call the header field) they are all key=value pairs. So use the header field to know what type of record you're dealing with (if it matters). Then break the subsequent fields into their key-value pairs. Note that if each record is fixed (aka it has an `id` field, `process` field, `station` field, etc) then you can use the array returned by Split to process the "fixed" fields directly. `attributedata` is the only one that looks custom so treat it as special data. For that field split into an array of string data points using Split with a comma separator. So the end result would be a structure similar to this.

    public class Record
    {
       public string Command { get; set; }
    
       //Assume these are the fixed fields, if not then ignore
       public string Id { get; set; }
       public string Process { get; set; }
       public string Station { get; set; }
       public string Software { get; set; }
    
       //Or IEnumerable<string> or string[], etc.
       public List<string> Data { get; set; }
       public string Status { get; set; }
    }


    Michael Taylor http://www.michaeltaylorp3.net

    Thursday, December 12, 2019 8:21 PM
    Moderator
  • Hello CoolDadTx,

    If you would split everything with the character | first, then check if there is a = character in it. Or could you solve your class, structure with RegEx expression?

    Thanks for tips in advance.

    Greetings Markus

    Friday, December 13, 2019 4:11 PM
  • I tend to keep REs in reserve for more complex parsing. For simple string splitting then Split is sufficient. If a particular field is more complex than that then maybe using RE for that subset might make sense. For example parsing a string for the parts of a URL would make sense as RE but breaking apart a CSV file line (ignoring quotes) is fine with split. Split will perform better than REs in terms of raw speed/space. REs tend to become better when you precompile them and reuse them for things beyond split.

    Michael Taylor http://www.michaeltaylorp3.net

    Friday, December 13, 2019 5:24 PM
    Moderator
  • You can use String.Split for everything as Michael says but here are some minor variations.

    You could begin by using String.IndexOf to find the first "|" and then validate that "REQUESTCOMMANDO" precedes it. Then use String.Substring to get the rest of the data.

    Then use String.Split for all the other data separated by "|". Then instead of String.Split you could use String.IndexOf to find each "=". Either String.Split or String.IndexOf could be used for the name value pair data (the data separated by "="). If you use String.Split for the name value pair data then you should specify the maximum number (I think 2) of items you want in the results.



    Sam Hobbs
    SimpleSamples.Info

    Friday, December 13, 2019 7:24 PM
  • Hello,

    You mean RegEx. Could you give an example? Just to see what's better.

    Thanks.

    Greetings Markus

    Saturday, December 14, 2019 4:53 PM
  • I wouldn't recommend the RE route. There is no reason to. Try using Split first and see if it meets your needs. You're talking a couple lines of code at most (3 splits and whatever if logic you need for determining the command).

    Michael Taylor http://www.michaeltaylorp3.net

    Saturday, December 14, 2019 5:35 PM
    Moderator
  • Here's a quick sample of how you might solve this problem. The bulk of the code is just your app logic around how to work with the parsed data and it can be as simple or as complex as you need. 

    class Program
    {
        static void Main(string[] args)
        {
            var data = "BACKCOMMANDO|id=x|process=CONNECTION|station=4565332|software=V1.242|attributedata=bottomSideId,1,2,3|attributedata=topSideId,4,5,6|status=PASS";
    
            var msg = ParseMessage(data);
        }
    
        //TODO: Consider putting this into a message parser class or even on your Message class directly if needed
        static Message ParseMessage ( string input )
        {
            if (String.IsNullOrEmpty(input))
                return null;
    
            //Break up into fields
            var fields = input.Split('|');
    
            //A proper message has at least a command
            var msg = new Message()
            {
                Command = fields[0].Trim()
            };
    
            //TODO: At this point you should probably validate you have a command you understand, if not abort...            
    
            //Split each subsequent field into key-value pairs
            var pairDelimiter = new[] { '=' };
            foreach (var field in fields.Skip(1))
            {
                //What field is this?
                var tokens = field.Split(pairDelimiter, 2);
    
                //Skip invalid fields, if this is even possible
                if (tokens.Length != 2)
                    continue;
    
                //Match the fields you care about
                switch (tokens[0].ToLower())
                {
                    case "id": msg.Id = tokens[1]; break;
                    case "process": msg.Process = tokens[1]; break;
                    case "station": msg.Station = tokens[1]; break;
                    case "software": msg.Software = tokens[1]; break;
                    case "status": msg.Status = tokens[1]; break;
    
                    case "attributedata": ParseAttributeData(msg.Data, tokens[1]); break;
                };
            };
    
            return msg;
        }        
    
        static void ParseAttributeData ( IDictionary<string, List<string>> attributes, string input )
        {
            if (String.IsNullOrEmpty(input))
                return;
    
            //Break up the attribute data
            var data = input.Split(',');
    
            //if there isn't at least 2 points then there is nothing to do
            if (data.Length < 2)
                return;
    
            //The first data point is the attribute name, see if we already have read that data
            var name = data[0];
    
            if (!attributes.TryGetValue(name, out var values))
            {
                //Create a new data point for this set of values
                attributes[name] = values = new List<string>();
            };
    
            //TODO: If the attribute already exists what do you want to do - reset, append, etc...
    
            //The remaining tokens is the data for this attribute
            values.AddRange(data.Skip(1));
        }
    }
    
    //Making the assumption that all messages are similar, just the data changes
    //If this isn't true then just use an appropriate structure to represent the data you want
    class Message
    {
        public string Command { get; set; }
    
        public string Id { get; set; }
    
        public string Process { get; set; }
    
        public string Station { get; set; }
    
        public string Software { get; set; }
    
        public string Status { get; set; }
    
        //Just making up a structure for the attribute data, build this out as you see fit based upon your requirements
        public IDictionary<string, List<string>> Data { get; } = new Dictionary<string, List<string>>(StringComparer.OrdinalIgnoreCase);
    }


    Michael Taylor http://www.michaeltaylorp3.net

    • Marked as answer by Markus Freitag Sunday, December 15, 2019 11:19 AM
    Saturday, December 14, 2019 7:55 PM
    Moderator