locked
Screen scrapping with AutoLogin RRS feed

  • Question

  • User-1597471785 posted

    Hi,

    I wanted to do AutoLogin and Screen scrapping the page.  I was able to login by extending WebClient. I'm using Asp.Net 4.5 without MVC.  The problem I have is, inside this page have a flash component that do an HTTP POST to get data.  Once the flash object do an HTTP POST, I can see in fiddler that I get an Authentication problem.  I suspect the ASP.NET_SessionId is not set for flash to do HTTP request.  I tried to set the Cookie in the Response but the flash object is still not rendering because of the authentication problem.  The ASP.NET_SessionId  is empty.

    Setting the ASP.NET_SessionId:

    Response.Cookies.Add(new HttpCookie("ASP.NET_SessionId", System.Web.HttpContext.Current.Session.SessionID));

    private CookieContainer container;
    
            public WebClientEx()
            {
                this.container = new CookieContainer();
            }
    
            public WebClientEx(CookieContainer container)
            {
                this.container = container;
            }
    
            public string Get(string URL)
            {
                return this.DownloadString(URL);
            }
    
            public CookieContainer CookieContainer
            {
                get { return container; }
                //set { container = value; }
            }
    
            protected override WebRequest GetWebRequest(Uri address)
            {
                WebRequest r = base.GetWebRequest(address);
                var request = r as HttpWebRequest;
                if (request != null)
                {
                    request.CookieContainer = container;
                    request.ProtocolVersion = HttpVersion.Version10;
                    request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0";
                    request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
                    request.KeepAlive = true;
    
                }
                return r;
            }
    
            protected override WebResponse GetWebResponse(WebRequest request, IAsyncResult result)
            {
                WebResponse response = base.GetWebResponse(request, result);
                ReadCookies(response);
                return response;
            }
    
            protected override WebResponse GetWebResponse(WebRequest request)
            {
                WebResponse response = base.GetWebResponse(request);
                ReadCookies(response);
                return response;
            }
    
            private void ReadCookies(WebResponse r)
            {
                var response = r as HttpWebResponse;
                if (response != null)
                {
                    CookieCollection cookies = response.Cookies;
                    container.Add(cookies);
                }
            }
    
            public string Post(string URL, NameValueCollection data)
            {
                return this.Encoding.GetString(this.UploadValues(URL, data));
            }

    Error Authentication:

    {"Message":"Authentication failed with error code <NotAuthenticated>","StackTrace":"","ExceptionType":""}

    Thursday, March 12, 2015 5:33 AM

Answers

All replies

  • User-1388839218 posted

    From the error message, it's no auth issue.

    Based on my understanding, ASP.NET_SessionId is not stored in cookie every time, you can refer to discussions on http://forums.asp.net/t/1490732.aspx for more details. So maybe ASP.NET_SessionId is not the real issue here.

    Are you connecting to a secure page of a HTTPS site? If so, you can take a look the following threads:

    http://forums.asp.net/t/1738956.aspx

    http://forums.asp.net/t/1537589.aspx

    Hope above info is helpful to you.

    Friday, March 13, 2015 4:57 AM
  • User-1597471785 posted

    Thanks.  I suspect it could be the ASP.NET_SessionId is the issue but I could be wrong.

    I place the setting of the response cookie in Session_Start.  But the problem is the page is setting the ASP.NET_SessionId two times, one is the session cookie from login page and the other one is from the authenticated page.  What I want is from the authenticated page session cookie.  I tried to check the response url and set the session cookie but still I am seeing two ASP.NET_SessionId even I cleared the cookie manually and reaccess the site.

    protected void Session_Start(Object sender, EventArgs e)
    {
         Response.Cookies["ASP.NET_SessionId"].Value = System.Web.HttpContext.Current.Session.SessionID;
         Response.Cookies["ASP.NET_SessionId"].Path = "Some Path";
    }

    Monday, March 16, 2015 3:19 AM
  • User-1388839218 posted

    Why did you think it's caused by ASP.NET_SessionId, as we know it will be set automatically at appropriate time. I don't think we have to set it manually.

    Monday, March 16, 2015 6:02 AM
  • User-1597471785 posted

    Each HTTP request, the server will create a unique session id for each client request.  The reason is that subsequent request from the client will use the same session id to request whatever web service/web api to the server.  Hence if the session id is different, the web service/web api will redirect it to the login page.  I know in normal process if we want to login, ofcourse the server will create it for you and there are no problem showing the data in each of the widgets or components that do a subsequent request because it has the current session.  

    What I'm doing right now is to automate that login process on an asp.net page.

    1. Create an ASP.NET page
    2. Use WebClient to post the user credential to a server which I already done. I can see the welcome page
    3. Once login, there are some components or widgets that will do an http request to a web service/web api to retrieve some information.  Here comes the problem, when that components or widgets do an http request, somehow the session id is lost. Hence the http response is not authorized in the fiddler log.  So that is why I want to try it out to add the session id back and see if that resolve the issue.  If you have a better Idea how to resolve it, it will be helpful if you could share.  I know there is one option which is to use a webbrowser component but this is a bit tricky.
    Monday, March 16, 2015 10:41 PM
  • User1104055534 posted

    Hi mutantbc,

    Based on your description, I understand you are able to login the welcome page. But some components on the page send sub requests to the server failed and according contents cannot display. If I misunderstand anything, please feel free to let me know. 

    In such scenario, could you please check the status and substatus code of these requests in IIS log? It should be 401.X if authorize issue. https://support.microsoft.com/en-us/kb/943891

    Meanwhile you can try failed request tracing to troubleshoot the status code you find in IIS log. http://www.iis.net/learn/troubleshoot/using-failed-request-tracing/troubleshooting-failed-requests-using-tracing-in-iis

    Hope useful to you!

    Best Regards

    Wednesday, March 18, 2015 2:29 AM
  • User-1597471785 posted

    Hi mutantbc,

    Based on your description, I understand you are able to login the welcome page. But some components on the page send sub requests to the server failed and according contents cannot display. If I misunderstand anything, please feel free to let me know. 

    In such scenario, could you please check the status and substatus code of these requests in IIS log? It should be 401.X if authorize issue. https://support.microsoft.com/en-us/kb/943891

    Meanwhile you can try failed request tracing to troubleshoot the status code you find in IIS log. http://www.iis.net/learn/troubleshoot/using-failed-request-tracing/troubleshooting-failed-requests-using-tracing-in-iis

    Hope useful to you!

    Best Regards

    The subsequent request (to web api/web service) after login is 401 authorized credential invalid. I'm sending the same session cookie (after login) on every subsequent request. Don't know why the subsequent request is failing.

    Wednesday, March 18, 2015 3:01 AM
  • User1104055534 posted

    Hi mutantbc,

    May I know if your server is IIS? If true, please setup failed request tracing for status code 401. You may find more information from the trace log. A summary of the failed request is logged at the top, with the Errors & Warnings table identifying any events that are WARNING, ERROR, or CRITICAL ERROR in severity.

    Wednesday, March 18, 2015 5:03 AM
  • User-1597471785 posted

    Hi mutantbc,

    May I know if your server is IIS? If true, please setup failed request tracing for status code 401. You may find more information from the trace log. A summary of the failed request is logged at the top, with the Errors & Warnings table identifying any events that are WARNING, ERROR, or CRITICAL ERROR in severity.

    Yes is an IIS. Unfortunately I dont have full control over the server as it is manage by a 3rd party.  Other suggestion would be great.  I  suspect the session is not retain during subsequent http request.  That is why I am getting the unauthorized 401.

    Wednesday, March 18, 2015 11:46 PM
  • User1104055534 posted

    Hi,

    I am not sure if this article is useful to you https://msdn.microsoft.com/en-us/library/jj713753.aspx. If you can share the Fiddler log when issue happened, I can help to check to see if any useful information.

    Best Regards

    Tuesday, March 24, 2015 10:36 AM
  • User-1597471785 posted

    Hi,

    I am not sure if this article is useful to you https://msdn.microsoft.com/en-us/library/jj713753.aspx. If you can share the Fiddler log when issue happened, I can help to check to see if any useful information.

    Best Regards

    Thanks for the info but the server doesnt return a Canary token.  I dont see any token being return from the server.  I only append the cookie in every subsequent request header, but it seems doesn't work.

    Cookie: ASP.NET_SessionId=
    Monday, March 30, 2015 10:09 PM
  • User1104055534 posted

    Hi mutantbc,

    I notice that the issue components are doing http requests to web service from the description. Here are some threads talking about session state in web service. Hope useful to you.

    http://forums.asp.net/t/1557679.aspx?Validating+a+Session+ID+using+a+Web+service

    http://forums.asp.net/t/1504324.aspx?Passing+session+ID+to+thrid+party+web+service+

    http://www.codeproject.com/Articles/35119/Using-Session-State-in-a-Web-Service

    From a support perspective this is really beyond what we can do here in the forums. If you cannot determine your answer here or on your own, consider opening a support case with us. Visit this link to see the various support options that are available to better meet your needs:  http://support.microsoft.com/default.aspx?id=fh;en-us;offerprophone."

    Best Regards

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Monday, March 30, 2015 11:00 PM
  • User-1597471785 posted

    Thanks Archer for the help.  My problem is when I do subsequent request which the server wont allow.  I can login using HTTP post from an asp.net mvc or Web API to the source server.  When I am login from the source server, the source server have other JavaScript that do AJAX to fetch for data, when the JS do an AJAX request, in the log it say not allowed.

    Tuesday, April 7, 2015 8:34 PM