none
Extract the values between the double quotes RRS feed

  • Question

  • I want to extract only those words within double quotes
    <script type="text/javascript">window.onload = function() {href = "/index?ID=3071087873023144";}

    The answer must be: /index?ID=3071087873023144

    I'm new to C #, can you help me please?

    Wednesday, June 6, 2018 1:41 PM

All replies

  • As this is HTML I strongly recommend that you use a toolkit to parse the HTML rather than trying to do simple string parsing. HTML is too complex for a simple string parse.

    However, given a simple string like attr="value" you can use a simple RE to find the information. Narrowing that down from your entire script tag those is problematic at best. Again, using an HTML parser like HtmlAgilityPack would be the best route.

    (?<attribute>\w+)\s*=\s*"(?<value>.*)"

    MSDN has an example of how to use that in a regular expression.

    Note that your full script tag won't work because the value you want is inside an attribute value already so it'll never see it. Hence why an HTML parser is a better choice.


    Michael Taylor http://www.michaeltaylorp3.net

    Wednesday, June 6, 2018 1:55 PM
    Moderator
  • You should take time to study Michael's cautions.  It is easy to come up with a hacked-up solution to this, but that solution is going to be very, very delicate.  People change their web pages all the time.  What you're looking for here is a string inside a Javascript statement inside a Javascript function inside a <script> block.  If they change ANY of that organization, a simple regex solution is going to stop working.

    Doing this robustly will not be easy.  It's not hard to use an HTML parser to find the <script> tag.  You could then fetch the text contents of that tag, but those contents are Javascript code.  You aren't going to want to try to interpret Javascript.

    In the end, whatever you come up with is only going to work until the page authors do things in a different way.  Just be prepared for that.


    Tim Roberts, Driver MVP Providenza & Boekelheide, Inc.

    Wednesday, June 6, 2018 6:52 PM
  • I want to extract only those words within double quotes
    <script type="text/javascript">window.onload = function() {href = "/index?ID=3071087873023144";}

    The answer must be: /index?ID=3071087873023144

    I'm new to C #, can you help me please?

    I agree with what the others say, avoid writing your own lexical processor for this, besides based on your question why is this not part of the output too?

    text/javascript

    that too is enclosed in quotes is it not?

    Saturday, June 9, 2018 6:46 PM