locked
WebRequest & webresponse, Redirection prblm RRS feed

  • Question

  • User-296860372 posted
     Hi

    Im trying to do a simple web data scrapping application using asp.net.

    Using : webrequest and webresponse to get the HTML of the webpage from where Im trying to pull data.

    This works fine for simple scarpping in the sense, it works fine for pages where from one page data is posted to result page directly. Say Http://server/mypage.asp?temp=myvalue is the URL which is directly the result page to which data is posted and in which result is shown. Im able to get HTML of the result page in such cases.

    But say if this URL is intermediatery page to which data is posted and from here if redirection is done to Result page(from where I have to get some data). Im unable to get the final result page html. Im getting the intermediatery page in the result of the Webresponse. 

    Hope I put my prblm clear. Any suggestions on this will be really helpful. Pls email me on Packiyanath.sk@kmgin.com

    Thanks
    packiyanath sk
    Friday, September 30, 2005 11:11 PM

All replies

  • User1873438307 posted
    Hello,

    Do you want to fill-submit a form then scrapping the results?

    regards
    Sunday, October 2, 2005 2:53 AM
  • User-296860372 posted
    The data that needs to be posted, Im sending it thru query string. I have a URL something like...

    http://mywebsite/flights/InitialSearch.asp?entryPoint=FD&flightType=roundtrip&leavingFrom=JFK&goingTo=MIA&dateTypeSelect=exactDates&leavingDate=12/01/2005&returningDate=12/12/2005

    Prblm is, if this InitialSearch is the result page where data is displayed in website I get the right HTML in webresponse

    But in some cases it is not so, from this initialsearch it is redirected to someother page where data is displayed. Something like

    WEBREQUEST to above URL -> INitialserach.asp -> result.asp. In this case, I get HTML of initialsearch.asp ... but I need html of result.asp.

    Hope Im clear.

    THanks

    Tuesday, October 4, 2005 4:43 AM
  • User-831224310 posted

    Hi,

    As far as i could understand, you were making a webrequest to a page P1 that was internally redirecting to some other page P2 and you were getting response from P1 in place of P2. I wrote a sample C# asp.Net code that does the exact same code and was able to get content of P2. It gives me content of P1 + P2 if i am do a server.transfer from P1 to P2 but gives out the content of only P2 if i do response.redirect.

    Here is the sample code:
    My test Page does the following thing in pageLoad,

    StreamReader streamReader = null;

    HttpWebRequest request = null;

    HttpWebResponse response = null;

    try

    {

    request =(HttpWebRequest) WebRequest.Create("http://localhost/WebApp1/WebForm1.aspx");

    request.Timeout = 50000;

    request.KeepAlive = false;

    response = (HttpWebResponse)request.GetResponse();

    streamReader = new StreamReader(response.GetResponseStream());

    Response.Write(streamReader.ReadToEnd());

    }

    catch(Exception ex)

    {

    if(ex.Message != null)

    Response.Write(ex.Message);

    Response.Write("Problems");

    }

    finally

    {

    if(streamReader != null)

    streamReader.Close();

    if(response != null)

    response.Close();

    }

    And My WebApp1 has a page WebForm1 which does the following on Page Load:

    Response.Write("Page 1");

    Response.Redirect("WebForm2.aspx");
    //Server.Transfer("WebForm2.aspx");

    And My webForm2 simply writes a line:

    Response.Write("Page 2");


    hope this may help you. In case it does not meet your requirements, please let me know.

    Wednesday, October 5, 2005 11:30 AM