none
Workaround from a lazy loading webpage RRS feed

  • Question

  • Hi there! Hope you all are doing well. While making a parser I noticed that Yellowpage Canada uses lazy loading method in their webpage. However, the issue I'm facing here is that the page contains 84 Names but my parser is scraping only 40 Names out of them. Is there any workaround from this? Any corrective measure from your end in my code would be a great help. Thanks in advance. Here is the code:

    http.Open "GET", "http://www.yellowpages.ca/search/si/1/Outdoor%20wedding/Edmonton", False
    http.send
    html.body.innerHTML = http.responseText
    Set topics=html.getElementsByClassName("listing__name--linkjsListingName")
    
    For Each topic In topics
        Cells(x, 1) = topic.innerText
        x = x + 1
    Next topic



    • Edited by ShahinIqbal Saturday, March 11, 2017 8:38 PM
    Saturday, March 11, 2017 8:35 PM

Answers

  • I've had problems with MS Internet Controls.  I use SeleniumBasic.  I use Xpath to select elements.  I use Firepath to create the Xpath.  They use Javascript and Ajax to get more of the page as you scroll down.  You need to scroll down to get all 84.  Selenium has an capability to call Javascript so you scroll down.

    Sub Test()
      Dim rtn As Variant
      Dim eles As WebElements
      
      Set drv = New IEDriver
      drv.Get "http://www.yellowpages.ca/search/si/1/Outdoor%20wedding/Edmonton"
      Set eles = drv.FindElementsByXPath("//div[@class = 'listing__content__wrap']")
      If eles.Count > 0 Then
        MsgBox "Element Count: " & eles.Count '40
      End If
      
      For i = 1 To 10
        rtn = drv.ExecuteScript("window.scrollBy(0,1200)", "") ' 1200 pixels
        drv.Wait (1000) ' 1 second
      Next i
      
      Set eles = drv.FindElementsByXPath("//div[@class = 'listing__content__wrap']")
      If eles.Count > 0 Then
        MsgBox "Element Count: " & eles.Count '84
      End If
    
    End Sub
    

     
    • Marked as answer by ShahinIqbal Monday, March 13, 2017 3:58 PM
    Sunday, March 12, 2017 6:54 PM

All replies

  • I've had problems with MS Internet Controls.  I use SeleniumBasic.  I use Xpath to select elements.  I use Firepath to create the Xpath.  They use Javascript and Ajax to get more of the page as you scroll down.  You need to scroll down to get all 84.  Selenium has an capability to call Javascript so you scroll down.

    Sub Test()
      Dim rtn As Variant
      Dim eles As WebElements
      
      Set drv = New IEDriver
      drv.Get "http://www.yellowpages.ca/search/si/1/Outdoor%20wedding/Edmonton"
      Set eles = drv.FindElementsByXPath("//div[@class = 'listing__content__wrap']")
      If eles.Count > 0 Then
        MsgBox "Element Count: " & eles.Count '40
      End If
      
      For i = 1 To 10
        rtn = drv.ExecuteScript("window.scrollBy(0,1200)", "") ' 1200 pixels
        drv.Wait (1000) ' 1 second
      Next i
      
      Set eles = drv.FindElementsByXPath("//div[@class = 'listing__content__wrap']")
      If eles.Count > 0 Then
        MsgBox "Element Count: " & eles.Count '84
      End If
    
    End Sub
    

     
    • Marked as answer by ShahinIqbal Monday, March 13, 2017 3:58 PM
    Sunday, March 12, 2017 6:54 PM
  • Thanks mogulman52 for your kind reply. Your method works perfectly.
    Basically, the site releases 40 records in each request. Like this:

    First 40 records: http://www.yellowpages.ca/search/si/1/Outdoor%20wedding/Edmonton
    Next 40 records: http://www.yellowpages.ca/search/si/2/Outdoor%20wedding/Edmonton
    Next 4 records: http://www.yellowpages.ca/search/si/3/Outdoor%20wedding/Edmonton

    I also figured it out and got the results using xmlhttp method. So to get the whole records "http" requests should be made thrice within a loop.


        For y = 1 To 3
        http.Open "GET", "http://www.yellowpages.ca/search/si/" & y & "/Outdoor%20wedding/Edmonton", False
        http.send
        html.body.innerHTML = http.responseText
        Set topics = html.getElementsByClassName("listing__name--link jsListingName")
        
            For Each topic In topics
                Cells(x, 1) = topic.innerText
                x = x + 1
            Next topic
        Next y


    • Edited by ShahinIqbal Monday, March 13, 2017 4:33 PM Correction and code block
    Monday, March 13, 2017 4:31 PM