Web Browser control save as dialog RRS feed

  • Question

  • I have a tool that reads html files from disk with the intention of modifying the code by doing some simple string replacements. I ended up using the Web Browser Control to load the page so I could get to the elements and do some string replacements. After doing so I tried to get the control to show the modified page but failed (Update rereads the file from disk). However, I could spin thru the elements again and see the changes were in the DOM. So I kicked off the print preview dialog. It shows the changes I made.

    Then I kicked off the save as dialog and saved the file to a new location. But, none of the changes showed up. So, print preview (and I assume print) shows the changes, but save as does not. I was hoping to use save as to output my changes.

    Why are print and save using different data? I want to save the changes I made, not just "copy" the file on disk to a new location.

    R.D. Holland

    Wednesday, September 25, 2019 12:53 PM

All replies

  • I don't know if the following will help but it is what I do for something similar.

    mshtml.HTMLDocument DomDocument = webBrowser1.Document.DomDocument as mshtml.HTMLDocument;
    if (DomDocument == null)
    Document = DomDocument as mshtml.HTMLDocumentClass;
    using (StreamWriter sw = new StreamWriter(BackupFilename))

    It writes just the body but I believe the file has default HTML tags and such. If that is not good enough then it will be more code, a bit more than we would prefer.

    You might be able to put the HTML into an InternetExplorer object but it is not easy to figure out how to do that. I think there are answers around here if you search for it. I probably can find a sample in my code. The difference between WebBrowser and InternetExplorer is that InternetExplorer does not show a UI, or at least you can keep it not visible; something like that.

    Sam Hobbs

    • Marked as answer by RD Holland Wednesday, September 25, 2019 7:28 PM
    • Unmarked as answer by RD Holland Wednesday, September 25, 2019 7:31 PM
    Wednesday, September 25, 2019 6:11 PM
  • And something else I have tried to do is to use HTML without WebBrowser or InternetExplorer and I forget if I was successful, I might have been. 

    Sam Hobbs

    Wednesday, September 25, 2019 6:18 PM
  • Thanks Sam,

    Getting the DOM and then (quickly test) getting the DOM.body.innerText showed that data had my string replacements.

    I find it awkward to keep going back and forth with the two models. Upper case versus lower case on things like innerText/InnerText etc.

    I'm guessing the print preview uses the DOM and save as uses the ... not DOM since what I see in the text visualizer in the debugger is what I see in print preview.

    R.D. Holland

    Wednesday, September 25, 2019 7:27 PM
  • Hi again Sam,

    I forgot that I had already spun thru the elements after changing them and seeing the changes in the element properties (inner text). Unfortunately, the DOM doesn't have a way to get the full text so I am really in the same spot. Doesn't matter too much as I am processing files without user intervention and showing the save as dialog appears to be as far as I can get. The WBC has a ShowSaveAsDialog but there is no direct SaveAs API on it :(

    I did try another way, reading the lines directly and using them to create a new file. But I always got a type mismatch exception when trying to call write. At first I passed in the array of lines, then tried each line. I even tried this:

                    string[] Lines = File.ReadAllLines(theFile);

                    int Count = Lines.Count();
                    int lineIndex = 0;

                    System.Array myArray = System.Array.CreateInstance(typeof(object), Count);
                    for (int index = myArray.GetLowerBound(0); index < myArray.GetUpperBound(0); ++index)
                        myArray.SetValue(Lines[lineIndex], index);

                    HTMLDocument theDoc = new HTMLDocument();

    But that failed too. Even tried creating an object array of size one and putting the "myArray" in it and calling write. No go. I finally just did string replacements on each line I read and then wrote them all out using FileStream. Not really what I want to do as I am only interested in changing inner and outer text in the document.

    R.D. Holland

    Wednesday, September 25, 2019 7:43 PM
  • The difference between the two models is that one is managed and the other is unmanaged. I assume they tried to make the names more logical for the managed model.

    You say you are using inner text so I am not sure what you are trying to do.

    Sam Hobbs

    Wednesday, September 25, 2019 8:12 PM
  • I do not understand what you mean by full text.

    I do not understand why you need a SaveAs function unless you need to provide a way for your users to specify where to save the file and if that is what you need then you can use the common save dialog, you don't need anything specific to HTML or whatever (especially since you are not saving HTML).

    Sam Hobbs

    Wednesday, September 25, 2019 8:16 PM
  • Sam,

    By full text I meant the data I found by calling webBrowser.DocumentText. Unfortunately, after spinning thru the elements and making my mods I found that DocumentText was unaffected. That's when I added code to show the two dialogs and noticed the difference between print and save as results.

    I am saving html. But I can't have a user sit and do this by actually running the save as dialog. I have in my current case well over 10,000 files I need to process. I don't author the files. Another system (and team) generates them and then I need to do some post processing on them and I have to be careful not to break anything such as hrefs, script or anything else that is not meant to be readable text when viewed in a browser.

    Here's the background on this. We have web based help on a product that is OEM'd (is that a word) by third parties. They want the help to show their users their company name  and the product name they choose and not ours. So my tool is attempting to find all such instances of either in the readable text and replace them with company and product names specified in my tool's UI by the third party. It would be like going to the MSDN collection and changing "Microsoft" everywhere that string appears with "HobbsSoft" and every "Windows" string with "Sams". And everything works perfectly after that is done :)

    R.D. Holland

    Thursday, September 26, 2019 1:28 PM
  • print and save as results.

    I still do not understand why you are using save as. As best as I understand, you can just get the HTML and write it out to a file directly.

    They want the help to show their users their company name  and the product name they choose and not ours.

    I think the term is rebranding.

    Sam Hobbs

    Thursday, September 26, 2019 5:53 PM
  • I used save as just to see what would get saved. What was saved was not the same as the print preview. I didn't run print to see if it matched the preview.

    When I called DocumentText, it did not have my changes. So writing that out would be pointless. Right now I am just reading the file in using File.ReadAllLines and spinning thru the lines and doing my text replacement. Then I write the lines back out to a new file.

    I saw a webBrowser::DocumentStream but I haven't tried it yet. Figured I needed to get something done. Plus, I was guessing it would just stream WebBrowser::DocumentText and I wouldn't see any changes. Still, I will try it just in case.

    I'm not sure how to "just get the HTML", modify it and write it out. That's what I was trying to do. I'll keep using the object browser (and debugger watch "dynamic" view to see how to get what I need.

    R.D. Holland

    Thursday, September 26, 2019 8:18 PM
  • I'm not sure how to "just get the HTML", modify it and write it out.

    In the post you are replying to, I did not say modify it and write it out, I just said write it out. It is my understanding that you have been successful at the reading and modifying parts.

    As for how to get and write out the HTML, I showed you how to do that in the post you marked and unmarked as the answer.

    Sam Hobbs

    Friday, September 27, 2019 12:51 AM
  • Sorry Sam. I unmarked it because after I did that, I saw it resulted in the same thing I had been doing. I'll go back and check it again. I do need the entire page written out. Header, scripts et. al. Perhaps the body outer HTML is the entire page I see when I open the page in an editor like Visual Studio and it had my changes and I didn't realize that.

    R.D. Holland

    Friday, September 27, 2019 2:14 PM
  • I really doubt that the body includes the scripts, especially if the scripts are not in the body.

    I would have replied yesterday except I had sinus work done.

    Sam Hobbs

    Saturday, September 28, 2019 8:58 PM