none
Using Wildcards to Replace HTML Code

    Question

  • I've got a table with a description field.  This field is being stored with html code interspersed throughout it.

    Example:

    <! 2012-11-10 00:02:22 wscalercio >  <p>&nbsp;</p>  <p>Program not responding &ndash; Killed program process</p>

    I'd like to get rid of all the html code.  I know in Excel, I can do a search for <*> and replace it with a blank space and then anything between < and > is gone.  I can't figure out how to accomplish this in SSRS. Is there a way?

    Tuesday, November 13, 2012 5:39 PM

Answers

  • In the report, you can add VB.Net code that strips the HTML:

    Function StripHTML(HTML As String) as String
          Dim rgx As New Regex("<(.|\n)*?>")
          return rgx.Replace(HTML,  "");
    End Function
    Then in the report controld where the stripped string should appear, add an expression:

    =Code.StripHTML(Fields!HTML.Value)

    Alternately you can display the field in all of its HTML glory. Instead of stripping the HTML, select the text inside the control (so the text is highlighted instead of the control being selected). Right-click > Placeholder properties > General tab > Markup type > HTML - Interpret HTML tags as styles.


    Please Mark posts as answers or helpful so that others can more easily find the answers they seek.

    • Proposed as answer by Naomi N Tuesday, November 13, 2012 9:33 PM
    • Marked as answer by JenniferBlue70 Tuesday, November 13, 2012 10:22 PM
    Tuesday, November 13, 2012 9:23 PM

All replies

  • jennifer, what was your expected output? can you do this in sql or you have to do it in reporting services?
    Tuesday, November 13, 2012 7:51 PM
  • Using the example I originally supplied, I'd like the end result to go from:

    <! 2012-11-10 00:02:22 wscalercio >  <p>&nbsp;</p>  <p>Program not responding &ndash; Killed program process</p>

    TO

    Program not responding &ndash; Killed program process

    I want to get rid of anything and including the < and > symbols.

    I haven't figured out how to do this in SQL either. But if it can be, that would be perfectly acceptable as well.  AI tried this:

    Replace(a.mralldescriptions,'%<%>%','')

    But that didn't work.  Any suggestions would be helpful.  Thanks!

    Tuesday, November 13, 2012 8:12 PM
  • In the report, you can add VB.Net code that strips the HTML:

    Function StripHTML(HTML As String) as String
          Dim rgx As New Regex("<(.|\n)*?>")
          return rgx.Replace(HTML,  "");
    End Function
    Then in the report controld where the stripped string should appear, add an expression:

    =Code.StripHTML(Fields!HTML.Value)

    Alternately you can display the field in all of its HTML glory. Instead of stripping the HTML, select the text inside the control (so the text is highlighted instead of the control being selected). Right-click > Placeholder properties > General tab > Markup type > HTML - Interpret HTML tags as styles.


    Please Mark posts as answers or helpful so that others can more easily find the answers they seek.

    • Proposed as answer by Naomi N Tuesday, November 13, 2012 9:33 PM
    • Marked as answer by JenniferBlue70 Tuesday, November 13, 2012 10:22 PM
    Tuesday, November 13, 2012 9:23 PM
  • Thanks for the reply. Unfortunately, casting to xml doesn't work, because the field is not formatted properly.

    Example:

    declare @str varchar(299)
    select @str = '<! 2012-11-10 00:02:22 wscalercio >  <p>&nbsp;</p>  <p>Program not responding &ndash; Killed program process</p>'
    select cast (@str as xml).query('.').value('.', 'varchar(200)')

    This returns an error: XML parsing: line 1, character 3, incorrect document syntax

    The first part of the field will always contain a variation of this: <! 2012-11-10 00:02:22 wscalercio >

    Tuesday, November 13, 2012 9:29 PM
  • Thank you Tim!

    Question: Is Regex usable in SSRS 2005? I'm getting an error message that says Type 'RegEx' is not defined.

    Supposedly we're going to 2012 in a month or so, but if this can work in 2005 that would be awesome. :)

    Thanks,

    Jennifer

    Tuesday, November 13, 2012 9:57 PM
  • Regex is part of System.Text.RegularExpressions. It is in the System assembly which should be available on your report server. Try modifying the function to:

    Function StripHTML(HTML As String) as String
          Dim rgx As New System.Text.RegularExpressions.Regex("<(.|\n)*?>")
          return rgx.Replace(HTML, "");
    End Function

    Please Mark posts as answers or helpful so that others can more easily find the answers they seek.

    Wednesday, November 14, 2012 6:42 PM