none
checking string value if table tag is available and implement regex RRS feed

  • Question

  • I have this string variable which composes a text and html tags. how do i perform regex only within the html table tag? is this possible?

    string input = "Hello,\nTRAVEL DETAILS\n<table border=\"1\">\n<tr>\n<th align=\"center\">Initial Travel Date</th>\n<th align=\"center\">Reference Number</th>\n<th align=\"center\">First Name</th>\n<th align=\"center\">Surname</th>\n<th align=\"center\">Main Reason</th>\n<th align=\"center\">Client ID</th>\n</tr>\n<tr>\n<td align=\"center\">{TRV TRL INIT.trn}</td>\n<td align=\"center\">{TRV REF NO.trn}</td>\n<td align=\"center\">{TRV FIRST NM.trn}</td>\n<td align=\"center\">{TRV SURNAME.trn}</td>\n<td align=\"center\">Internal Meeting</td>\n<td align=\"center\">{TRV CLIEN ID.trn}</td>\n</tr>\n</table>"
    
    string output = Regex.Replace(input, @"\t|\n|\r", "");
    return output;

    Monday, November 4, 2019 12:27 AM

All replies

  • In order to identify non-nested table, you can try a simplified expression like this:

       (?s)<table(\W.*?)?>.*?</table>

    Then put more constructs instead of second “.*?”. Give details about your needs.

    However, it is also possible to use some special library that deals with HTML format.

    Monday, November 4, 2019 6:04 AM
  • Hi nhoyti_, 

    Thank you for posting here.

    As Viorel_ suggested, you can use the expression to identify non-nested table, and then perform regex in the table.

    Here’s the code of my test.

                string input = "Hello,\nTRAVEL DETAILS\n<table border=\"1\">\n<tr>\n<th align=\"center\">Initial Travel Date</th>\n<th align=\"center\">Reference Number</th>\n<th align=\"center\">First Name</th>\n<th align=\"center\">Surname</th>\n<th align=\"center\">Main Reason</th>\n<th align=\"center\">Client ID</th>\n</tr>\n<tr>\n<td align=\"center\">{TRV TRL INIT.trn}</td>\n<td align=\"center\">{TRV REF NO.trn}</td>\n<td align=\"center\">{TRV FIRST NM.trn}</td>\n<td align=\"center\">{TRV SURNAME.trn}</td>\n<td align=\"center\">Internal Meeting</td>\n<td align=\"center\">{TRV CLIEN ID.trn}</td>\n</tr>\n</table>";
                string pattern = @"(?s)<table(\W.*?)?>.*?<\/table>";
                Regex regex = new Regex(pattern);
                var match = regex.Matches(input);
                string result = null;
                foreach (var m in match)
                {
                    string output = Regex.Replace(m.ToString(), @"\t|\n|\r", "");
                    result = input.Replace(m.ToString(), output);        
                }
                Console.WriteLine(result);

    Result:

    Best Regards,

    Xingyu Zhao


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, November 4, 2019 9:29 AM
    Moderator
  • Experts say do not do that. See the following relevant discussions. Note that the one answer looks like an idiot typed all over it and some ink splashed on it. There is a note saying it is supposed to be like that, presumably to emphasize the foolishness of using regexs for HTML.



    Sam Hobbs
    SimpleSamples.Info

    Monday, November 4, 2019 8:30 PM
  • Hi nhoyti_,

    Is your problem solved? If so, please click "Mark as answer" to the appropriate answer, so that it will help other members to find the solution quickly if they face a similar issue.

    Best Regards,

    Xingyu Zhao


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, November 28, 2019 9:40 AM
    Moderator