locked
How to allow scraped site it's own forms? RRS feed

  • Question

  • User50862095 posted

    I have a poll on an ASP site that I'm converting to .NET 2.0.  The poll is written as an include file, intended to be embedded in an asp page.  It's not compatible with ASP.NET 2.0.  It's not my code, I don't have the time to rewrite it, and want to replace it anyway.  The conversion needs to go live ASAP, so I don't have the time to find/work on a replacement.  So I put the include code into it's own asp page and used HttpWebRequest and HttpWebResponse to scrape it. 

    The problem is the aspx page refreshes when I click the vote button instead of going to the results page, even though the vote button is from the asp page.  Since the code for the vote button is in asp and not aspx, I don't know how to get around this.

    The results page is also written asp, I was intending to scrape that also, but I somehow need to be able to get to it when the vote button is clicked on.

    Diane
     

    Thursday, March 22, 2007 9:32 AM

Answers

  • User-1639143169 posted

    Correct me if I am wrong...

    As i understood your situation you have

    1 aspx page which is calling and scraping an asp page, to get the poll form and inlcude in your aspx page.

    The "natural" way to do this would be an Iframe possible.

    But going with your screen scrape methodology, I am saying scrape the asp page, use a regex to extract the poll form

    <form.....> poll html </form>

    You know have a string pollFormHtml for example.

    You would put this HTML string into your current Aspx page. But you want the submit button from the form, to submit to the original asp form submit.

    So in pollFormHtml you need to make sure the action attribute is going to the correct place that you want.

     

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 22, 2007 12:59 PM

All replies

  • User-1639143169 posted

    My first thought, is that when you scrape the initial form, you will want to wrap it inside a new Html <form> with an action going to where you want, and not the default submit for the aspx page.

    Then the submit from the scrape, would live inside its own html form once rendered to the page. ( you are including as a string, not aspx controls, so you should have no problem ).

    What happens after the postback, is already another story if you go back to the original asp posting page. or what.

    Thursday, March 22, 2007 10:59 AM
  • User50862095 posted

    I'm not sure I understand what your saying.  I'm scraping a complete page complete with body, html, etc.   Unless you're suggesting I wrap it again in the aspx?  How would I do that?

    Diane 

    Thursday, March 22, 2007 11:13 AM
  • User-1639143169 posted

    When you scrape the page, you are inlcuding the entire html in your aspx page?

    Why not do a regex or string part, and take only the form? and write that out to your aspx page?

    Are you clearing the response on your aspx page, and replacing with what you scrape? If thats the case, the the form would submit to it's original intended target, unless the action is relative path, in which case you still have to play with the scrape, and correct the action to post to where you want.

    Something along the lines of getting just the html you want from the screen scrape:

    1    string pollHTMLForm = string.Empty;
    2    
    3    Regex regex = new Regex("&lt;div id='polldiv'>((.|\n)*?)</div>>", RegexOptions.IgnoreCase);
    4    
    5    using (WebClient httpRequest = new WebClient())
    6    {
    7    	httpRequest.UseDefaultCredentials = true;
    8    
    9    	httpRequest.Headers.Add("user-agent", userAgent);
    10   
    11   	string strContent = httpRequest.DownloadString(URL);
    12   
    13   	Match oM = regExToMatch.Match(strContent);
    14   
    15   	pollHTMLForm = oM.Value;
    16   }
    17   
    
     
    Thursday, March 22, 2007 11:25 AM
  • User-1639143169 posted

    When you scrape the page, you are inlcuding the entire html in your aspx page?

    Why not do a regex or string part, and take only the form? and write that out to your aspx page?

    Are you clearing the response on your aspx page, and replacing with what you scrape? If thats the case, the the form would submit to it's original intended target, unless the action is relative path, in which case you still have to play with the scrape, and correct the action to post to where you want.

    Something along the lines of getting just the html you want from the screen scrape:

    1    string pollHTMLForm = string.Empty;
    2    
    3    Regex regex = new Regex("<div id='polldiv'>((.|\n)*?)</div>", RegexOptions.IgnoreCase);
    4    
    5    using (WebClient httpRequest = new WebClient())
    6    {
    7    	httpRequest.UseDefaultCredentials = true;
    8    
    9    	httpRequest.Headers.Add("user-agent", userAgent);
    10   
    11   	string strContent = httpRequest.DownloadString(URL);
    12   
    13   	Match oM = regExToMatch.Match(strContent);
    14   
    15   	pollHTMLForm = oM.Value;
    16   }
    17   
    
     
    Thursday, March 22, 2007 11:26 AM
  • User50862095 posted

    You lost me completely

    Why do I only want part of the page?  Then what do I do with it?

    Diane 

    Thursday, March 22, 2007 12:41 PM
  • User-1639143169 posted

    Correct me if I am wrong...

    As i understood your situation you have

    1 aspx page which is calling and scraping an asp page, to get the poll form and inlcude in your aspx page.

    The "natural" way to do this would be an Iframe possible.

    But going with your screen scrape methodology, I am saying scrape the asp page, use a regex to extract the poll form

    <form.....> poll html </form>

    You know have a string pollFormHtml for example.

    You would put this HTML string into your current Aspx page. But you want the submit button from the form, to submit to the original asp form submit.

    So in pollFormHtml you need to make sure the action attribute is going to the correct place that you want.

     

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 22, 2007 12:59 PM
  • User50862095 posted

    The action attribute is correct, but the aspx page insists on posting back to itself anyway, which is what it's supposed to do.  I'm trying to find a way around that. 

    Diane 

    Thursday, March 22, 2007 1:50 PM