none
mshtml parsing is very slow RRS feed

  • Question

  • Hi

    I have been playing with ShDocVw and internet explorer instances. I develop with Visual Studio in VB.

    I use the mshtml library to parse HTML documents, using the interfaces to navigate and casting the instances into the HTML classes when necessary.

    It appears mshtml is very slow. Traversing a document of 867 nodes takes 7 seconds on i8700K processor.

    Is there something wrong ?

    Here is the inner subroutine that is called 867 times.

      Public Sub DumpHElem(oHTMLElem As H.IHTMLElement)
        iElemCount += 1
        If oHTMLElem.children Is Nothing Then Exit Sub
        Dim oChildren = MSS.Children(oHTMLElem)
        If oChildren.length = 0 Then Exit Sub
        For Each oElem1 As Object In oChildren
          Call DumpHElem(IHTMLElement(oElem1))
        Next
      End Sub

    Wednesday, May 9, 2018 7:46 AM

Answers

All replies

  • Is this for a debug or a release build?

    Are you sure that other HTML parsers are faster doing the same thing?



    Sam Hobbs
    SimpleSamples.Info

    Wednesday, May 9, 2018 4:20 PM
  • this is a debug release

    In the meantime I noticed that option strict was off in this project. There were quite a few late bindings. I am finalizing the modifications to remove them all. This may help. I will keep you posted

    Wednesday, May 9, 2018 5:50 PM
  • with release option and all late bindings removed it went down to 5 seconds.

    Still very slow I think. I will keep investigating to see if I can pinpoint where the time goes

    Wednesday, May 9, 2018 6:02 PM
  • I think mshtml is very slow when the html gets a little complex.

    • Marked as answer by JMB1502 Friday, May 18, 2018 6:37 AM
    Friday, May 18, 2018 6:37 AM
  • I said:

    Are you sure that other HTML parsers are faster doing the same thing?

    You said:

    I think mshtml is very slow when the html gets a little complex.

    Well yeah. Obviously you asked this question because you thought it is just MS HTML that is slow. So you see now why it is important to provide more information in the question. In this case it is likely you would not have posted the question if you had had more information.



    Sam Hobbs
    SimpleSamples.Info

    Friday, May 18, 2018 7:25 AM