locked
Web scraping

    Question

  • I am stumped on this. 

    I am using HTTPWebRequest to scrape a website:

    HttpWebRequest inRequest = (

    HttpWebRequest)System.Net.WebRequest.Create(inUrl);

     

    using

    using

     

    (HttpWebResponse Response = (HttpWebResponse)inRequest.GetResponse())

      When I browse to that site through a browser, I see a cookie is sent with the request after the connection:

    Fiddler:

    GET https://xxxx/xxx/default.aspx HTTP/1.1

    Accept: image/gif, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, application/x-shockwave-flash, application/msword, application/vnd.ms-excel, application/x-silverlight, application/vnd.ms-powerpoint, */*

    Accept-Language: en-us

    User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; WOW64; Trident/4.0; GTB6.5; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 1.1.4322; .NET CLR 3.0.30729; .NET4.0C)

    Accept-Encoding: gzip, deflate

    Host: xxx.xxx.xx

    Connection: Keep-Alive

    Cookie: s_nr=1287501262989; s_vnum=1290093262996%26vn%3D3; __utma=99492171.1873908887.1288046487.1288046487.1288050535.2; __utmz=99492171.1288046487.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); ASP.NET_SessionId=awcpqo55xgbvwn55mlfsw045

    I cannot figure out where that cookie information is coming from.  I make an initial request to grab the ViewState and EventValidation information and I would have expected a cookie to be in that response, but there is none. 

    What am I missing here?


    mhaddy
    Tuesday, October 26, 2010 5:43 PM

All replies

  • > When I browse to that site through a browser

    By default, cookies are disabled for HttpWebRequest (http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.cookiecontainer.aspx).

    You said you captured that request when you went to the page using a browser.  That cookie must be saved in your browser.  Clear your Internet Explorer cookies, close all open instance of Internet Explorer, then try again.

     

    Thursday, October 28, 2010 12:39 AM
  • It may be saved, but that won't help.  I have to get it through code. 

    I need to grab the cookie during the response of my HTTPWebRequest.  Cookies returns null in the response. The only difference I can see from what I send and what Fiddler reports - User agent.  The browser sends that information during the connection phase, but HTTPWebRequest does not.  Could that have anything to do with it?


    mhaddy
    Thursday, October 28, 2010 12:58 AM
  • That's probably because cookies are disabled with HttpWebRequest.  Learn how to set the CookieContainer to enable them, as documented here:

    http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.cookiecontainer.aspx

     

    Thursday, October 28, 2010 1:05 AM
  • I wish it was that easy.  There is something that is sent to the website that will trigger them sending the cookie.  I just cannot figure it out.  The only difference I see is the user agent information that is sent with the connection.

    mhaddy
    Thursday, October 28, 2010 5:15 PM
  • Hi mark,

    I am having the same issue like you.

    When looking how a browser sends each petition I can see that one of them has some cookeis that has not been received before.

    I would like to know if were able to solve that.

     

    Thanks

     

    Thursday, December 23, 2010 7:17 PM
  • I don't recall this particular one, however, I did run into a similar problem recently.  I did fill in the user-agent information and I had to navigate the site sequentially through the parent pages, as it sent different cookie information on the prior pages.

    mhaddy
    Thursday, December 23, 2010 8:31 PM
  • I found that some cookeis where setted by javascript code as:

    document.cookie = "..";
    

    Thanks mark.

    Thursday, December 23, 2010 11:16 PM