Ask a questionAsk a question
 

QuestionWebClient / WebBrowser / WebRequest

  • Friday, May 09, 2008 9:24 AMAni K Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Hi,

     

    I am working on a server side app which goes thru all the emails, finds links to pdf or other files downloads them and stores them in a particular folder. All works fine until it finds a link where the website needs custom authentication via its own login page before continuing with the download file link.

     

    Now because we have a definite list of these websites which need this custom login page authentication, I have managed to hack their login page html, enter the login and password textboxes with values of the real login and password and called the necessary authentication javascript function that initiates the login procedure in body onload of the html.

     

    WebClient webClient = new WebClient();

    string strUrl = strLoginURL;

    byte[] reqHTML;

    reqHTML = webClient.DownloadData(strUrl);

    UTF8Encoding objUTF8 = new UTF8Encoding();

    string urlHtml = objUTF8.GetString(reqHTML);

     

    Now i change the  urlHtml    where i replace  for eg:

     

    <INPUT type="text" name="USER" maxlength=50 class="frmlogintxtbox" size="25" value="">

    <INPUT type="text" name="USER" maxlength=50 class="frmlogintxtbox" size="25" value="realLoginID">

     

    Similar changes for password field and bodyOnload on the html  ... Then ..

     

    WebBrowser wb = new WebBrowser();

    wb.Navigated+=new WebBrowserNavigatedEventHandler(wb_Navigated);

    wb.DocumentText = urlHtml;

     

    This takes me to the authenticated part of the website ...so now in navigated event handler

     

    void wb_Navigated(object sender, WebBrowserNavigatedEventArgs e)

    {

    if (e.Url.ToString() == "PAGE NAME Reached AFTER AUTHENTICATION")

    {

    ((WebBrowser)sender).Navigate(strDownloadURL);

    }

     

    if (e.Url.ToString() == strDownloadURL)

    {

    // SUCCESS - reached this stage means my PDF is downloaded in the webBrowser control silently

    // i know i have hacked .... done all sorts of things to reach here ... yet cant finish it off.

    // I AM UNABLE TO SAVE THIS PDF THAT I HAVE MANAGED TO GET INTO THIS WEBBROWSER CONTROL SUCCESSFULLY

    }

    }

     

    NOW here is my real question.... .For those who have not got bored and left this thread already ......

     

    How do i SAVE  this pdf/excel document in the webBrowser control ??

     

    This is supposed to run in a background server job so manual intrevention is out of question.

     

    Cant use webClient all the way because it does not allow me to hack the html and then browse it ... More importantly it does not act like a browser which i need ... the login page after getting successful credentials forwards to another page etc ... webClient is unable to go along with it and tell me like a webBrowser control that it has managed to the reach the last page in the login process or the authenticated part of the site.

     

     

    webBrowser has all the functionality in the world, but not a slient SAVE function. or something that can give me a reference to the document it has currently successfully loaded.

     

     

    Tried even referencing the said document from Temp Internet Files ... but Microsoft for some reason is not allowing me to see files from that folder, especially my downloaded document which is currently loaded in webBrowser control.

     

    string tempDir = Environment.GetFolderPath(Environment.SpecialFolder.InternetCache).ToString();

    DirectoryInfo di = new System.IO.DirectoryInfo(tempDir);

     

    // File name in temp internet files folder ending with my PDF file name

    string searchPattern = strDownloadURL.Substring(strDownloadURL.Length-11, 10);

     

    FileInfo[] files = di.GetFiles(searchPattern);

     

    Here files.Count is  ZERO

     

     

    Any ideas guys ??

     

    Thanks

    Ani

All Replies

  • Sunday, May 11, 2008 9:35 PMOsty Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I am not an expert, in fact, I have asked a question about webrequest and WebBrowser control myself. However I have some code working sucessfully that you may be able to use. In my situation, I need to download a csv file that is on a website which requires authentication. If you go to the site, login, then click the link to the file, a save as dialog comes up for you to save the file. I do it in code like this....
         
    Code Snippet

       // Login to the site first:

       // The cookieContainer holds our login status for the second step

        HttpWebRequest^ web = dynamic_cast<HttpWebRequest^>(WebRequest::Create("<Login Page that you are taken to when you click the login button on the form>"));
        web->Timeout = 10000;    // 10 Seconds
        web->CookieContainer = cookieContainer;
        web->Method = "POST";
        web->ContentType = "application/x-www-form-urlencoded";
        ASCIIEncoding^ encoding = gcnew ASCIIEncoding;
        array<byte>^ login_bytes = encoding->GetBytes("email="+username+"&password="+password+"&Submit=Login");
        web->ContentLength = login_bytes->Length;
        Stream^ webStream;
        try {
            webStream = web->GetRequestStream();
        }
        catch (Exception^) {
            process_que->set_error ("The url could not be resolved");
            return false;
        }
        webStream->Write (login_bytes, 0, login_bytes->Length);
        webStream->Close ();
            // Get the response:
        HttpWebResponse^ res;
        try {
            res = (HttpWebResponse^)web->GetResponse ();
        }
        catch (Exception^) {
            web->Abort ();
            process_que->set_error ("Web Error during login");
            return false;
        }


    // cookieContainer holds our login cookies. Now we can go to the page to download the file.


          // Grab the csv file:

          // The url is the link that you go to when you click on the download link. In my situation, the MyListingsCSV.aspx page generates the file, and returns it in the stream.

        url = "http://www.abc.co.nz/Export/MyListingsCSV.aspx?ListingType=Current&filter=all";

        HttpWebRequest^ web = dynamic_cast<HttpWebRequest^>(WebRequest::Create(url));
        web->CookieContainer = cookieContainer;
        web->Method = "GET";
        WebResponse^ res;
        try {
            res = web->GetResponse();
        }
        catch (Exception^) {
            if (res)    res->Close ();
            web->Abort ();
            return;
        }
        StreamReader^ sr = gcnew StreamReader(res->GetResponseStream());


          // Now you can save the stream to a file:
        File::WriteAllText ("filename.csv", st->ReadToEnd());



     Ok, so thats how I do it in my situation. I actually process the csv file directly rather than saving it, but you should be able to do something similar to get your file. This way, there is no filling of forms etc, but you just need to look at the login form etc to see what page you are directed to when the login button is clicked and use that in your code.

    I hope this helps you.

    Cheers,
      Carl

  • Wednesday, September 02, 2009 4:18 PMAlok Saxena Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Carl & Ani,

    Can you please provide the code project, it will help me to know other stuffs.

    Thanks in advance.
    -Alok
  • Wednesday, November 04, 2009 7:44 AMSjums Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Has Code
    Hello :) I would (after i logged in an navigated to the right page) download the site via HttpWebRequest into a string, then i'll create a RegEx (regular expression) to search for matches and sort'em out :) After that download the pdf's via the direct URL i found in the source of the webpage :P

    something similar to:

    string sourceOfWebsite = theDownloadedPage;
    RegEx findPdfs = new RegEx("http://someURL/[filename]+.pdf");
    MatchCollection mc = findPdfs.Matches(sourceOfWebsite);
    streamWriter sw = new StreamWriter("file.txt");
    foreach(Match m in mc)
    {
    sw.writeline(m.value.tostring());
    }
    sw.close();
    
    That should give you a file.txt with all the urls of the pdf's ^^ and they should be able to be downloadable with webClient or WebRequest :)

    (sry for any errors, code not testet)

    //Sjums