locked
Screen Scrapping a password protected page RRS feed

  • Question

  • User-2116449068 posted
    Hi Friends, I want to screen scrap some values from a web page on a different website and the page i am trying to access is password protected. That means, that only if i m logged into the website, then only i can access the desired page. I have the user information required to login into the website. I have tried several stuff nothing seems to be working. The thing which came pretty much close to what i want is :http://www.aspalliance.com/stevesmith/articles/netscrape2.asp , but this also doesn't seems to be working. I am looking forward to some of your valuable suggestions in this regards. Thanks, Regards, LearningTheTricksOfASP.NET
    Friday, April 9, 2004 3:22 PM

All replies

  • User-260194411 posted
    Hi, I am working on a similar problem and don't have it working either yet, but....so far I think: The approach depends on the authorisation mechanism in use on the server, of which there are lots ranging from NT authentication, http basic, https, etc, etc. I think the first thing you need to know is which method is in use on the server you are trying to log in to. One thing I have found helpful is using an http sniffer. (I use the one from www.effetech.com). You can use this to see the requests and responses for methods that use http. I have had lots of problems using webclient, webrequest and streams to try and generate the login strings which seem to be appended to the http requests. The examples given indicate that they should be being appended, but the sniffer says they are not. Please post any solutions you find. Regards
    Thursday, April 15, 2004 7:04 AM
  • User-260194411 posted
    Hi Again, I thought of something further after looking at the example you indicated you were using. (http://authors.aspalliance.com/stevesmith/articles/netscrape2.asp). I tried this method a while ago and gave up on it because if you put the sniffer on it and run it in debug the request to the server is actually sent on the line: myWriter = new StreamWriter(objRequest.GetRequestStream()); and the web server responds to it. When the next line: myWriter.Write(strPost); is executed the sniffer does not detect anything further, probably because the request/response loop is complete. I thought trying to test to see if the server really does receive the string that the example tries to post, but did not come up with a simple method of doing it. Ideas are very welcome. Regards
    Thursday, April 15, 2004 7:14 AM
  • User-2116449068 posted
    HI Keith, Thanks for your post, no ideas. I am scratching my head, that what's wrong with my code. I shall definitely let you know, as soon as i come up with a solution/idea. Regards,
    Thursday, April 15, 2004 2:07 PM
  • User1593236550 posted
    Has anybody succeeded with this yet?
    Monday, August 2, 2004 9:05 AM
  • User-773277106 posted
    This might help you guys Set objIE = WScript.CreateObject("InternetExplorer.Application","objIE_") objIE.Visible = False boolBrowserRunning = True objIE.Navigate "http://localhost/screenscraping/LoginScripts/dologin.html" ' this is a local page i made to fake a login by setting username and password by as default values then forcing a javascript submit Do While boolBrowserRunning WScript.Sleep 500 Loop set objIE = nothing Sub objIE_documentComplete(ByVal pDisp, URL) if instr(url, "http://www.TheLandingPageofTheSecuredPage.com/Index.asp") then mySaveFile(objIE.document.body.innerhtml) ' This is the whole page objIE.Navigate "http://www.TheNextUrlIWantToNavigateTo.com/index.asp" elseif instr(http://www.TheNextUrlIWantToNavigateTo.com/index.asp) then mySaveFile(objIE.document.body.innerhtml) ' This is the whole page end if End Sub
    Friday, September 3, 2004 9:54 AM