none
Macro to transfer information from html to excel RRS feed

  • Question

  • Hello dear friends

    Macro to transfer information from html to excel

    My urls are in sheet1 and column 1
    All of them are just Instagram  . example : https://www.instagram.com/leomessi/

    I'm looking for a macros that only download line 160
    Just line 160 which contains information from the profile

    If any of the information in this line is transferred to the sid cell in sheet1 for each url They will be updated easily.
    All in one row and each part in a cell.
    I wrote the code below, but it's an amateur macros
     
    Sub get_lin160()


    Dim wb As Object
    Dim doc As Object
    Dim sURL As String
    Dim lastrow As Long
    Dim n As Integer

    lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
    For i = 2 To lastrow
    Set wb = CreateObject("internetExplorer.Application")
    sURL = Cells(i, 1)
    wb.navigate sURL
    wb.Visible = False
    While wb.Busy
    DoEvents
    Wend
    'HTML document
    Set doc = wb.document
    Cells(i, 2) = doc.body.innerText
    myarray = Split(Data, vbCrLf)
    err_clear:
    If Err <> 0 Then
    Err.Clear
    Resume Next
    End If
    wb.Quit
    Next i

    End Sub


    Thanks dear friend who helps me



    Sunday, September 9, 2018 4:24 PM

Answers

  • We have downloaded your file and run the button which found a run-time err91 problem as you mentioned. So the problem is that we're running it differently since we've been running code in debug mode before.

    Also, for the error91 issue, we are testing it and will let you know if there is any progress. It may a HTML loading problem and you can refer to this post to know it.

    I noticed that the post mentioned, "It is being caused by attempting to access the document before it's completely loaded. You're just checking ie.ReadyState."

    Issue code : 

    Name = doc.getElementsByClassName("AC5d8 notranslate")(0).innerText


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.

    • Marked as answer by kapriano Wednesday, September 12, 2018 7:59 PM
    • Unmarked as answer by kapriano Wednesday, September 12, 2018 8:12 PM
    • Marked as answer by kapriano Thursday, September 13, 2018 7:32 AM
    Wednesday, September 12, 2018 1:07 PM
    Moderator

All replies

  • Hi kapriano,

    As I understand it, you want to get the page content for line 160 of Excel. Please try this code below:

    Sub get_lin160()
    
    
        Dim wb As Object
        Dim doc As Object
        Dim sURL As String
        Dim lastrow As Long
        Dim n As Integer
        
        lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
        For i = 2 To lastrow
            If i = 160 Then
                Set wb = CreateObject("internetExplorer.Application")
                sURL = Cells(i, 1)
                wb.navigate sURL
                wb.Visible = False
                While wb.Busy
                  DoEvents
                Wend
                'HTML document
                Set doc = wb.document
    
                Dim oDoc As HTMLDocument
    
                Set oDoc = New HTMLDocument
    
                oDoc.body.innerHTML = doc.body.innerText
    
                HtmlToText = oDoc.body.innerText
    
                Cells(i, 2) = HtmlToText
                myarray = Split(Data, vbCrLf)
                err_clear:
                If Err <> 0 Then
                  Err.Clear
                  Resume Next
                End If
                wb.Quit
            End If
        Next i
    
    End Sub

    Best Regards,

    Bruce


    Monday, September 10, 2018 10:32 AM
    Moderator
  • Hi bruce Dai oh !!! I mean line 160 from the HTML page!

    159 .....
    160 <link rel="canonical" href="https://www.instagram.com/leomessi/" /><meta content="97.7m Followers, 205 Following, 334 Posts - See Instagram photos and videos from Leo Messi (@leomessi)" name="description" />
    161 ....
    162 ....
    163 ....
    245 <script type="text/javascript">window._sharedData = {"activity_counts":{"comment_l...........true};</script>

    You have ordered rows 160   from the Excel page .

    Even with this assumption Your macro At this line ( Set oDoc = New HTMLDocument)

    The following error message is displayed:

    compile error: user- defined type not defined

    for test, insert the Instagram url  into the  sheet1 in A2  

    For all Urls in line 160  and  245  There are 5  items
    I want these items to be downloaded  sheet1. in a row and five cell .Enter the same  URL row.
    In the picture below, you can see what I mean

    {http://s9.picofile.com/file/8336922700/6789.png}

    thanks
    Monday, September 10, 2018 8:50 PM
  • Hi kapriano,

    To avoid system errors, you need add reference.

    Please add following reference:

    According to my understand. You want to acquire data from https://www.instagram.com/leomessi/ .

    You can try the following code that I created to get those values as follow:

    Sub test()
    
        Dim wb As Object
        Dim doc As Object
        Dim sURL As String
        Dim lastrow As Long
        Dim n As Integer
        Dim i As Integer
        Dim HtmlToText As String
        Dim result
        lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
        For i = 2 To lastrow
            
            Set wb = CreateObject("internetExplorer.Application")
            sURL = Cells(i, 1)
            wb.navigate sURL
            wb.Visible = False
            While wb.Busy
              DoEvents
            Wend
            'HTML document
            Set doc = wb.document
            Dim Name As Variant
            Dim Posts As Variant
            Dim Follows As Variant
            Dim Following As Variant
            Dim DivValue As Variant
            Dim DivValueSplit As Variant
            Dim DivValueResult As Variant
            Dim Biography As Variant
            
            Name = doc.getElementsByClassName("AC5d8 notranslate")(0).innerText
            Posts = doc.getElementsByClassName("g47SY")(0).innerText
            Followers = doc.getElementsByClassName("g47SY")(1).innerText
            Following = doc.getElementsByClassName("g47SY")(2).innerText
            'dd = web.document.querySelector("div.-vDIg span").innerText
            DivValue = doc.getElementsByClassName("-vDIg")(0).innerHTML
            DivValueSplit = Split(DivValue, "<br>")
            DivValueResult = DivValueSplit(1)
            j = InStr(DivValueResult, "</span>")
            Biography = Mid(DivValueResult, 7, j - 7)
            Worksheets("sheet1").Cells(i, 2) = Name
            Worksheets("sheet1").Cells(i, 3) = Followers
            Worksheets("sheet1").Cells(i, 4) = Following
            Worksheets("sheet1").Cells(i, 5) = Posts
            Worksheets("sheet1").Cells(i, 6) = Biography
            'Biography = Replace(re1, "<span>", "")
                    
            'Cells(i, 2) = HtmlToText
            ' myarray = Split(Data, vbCrLf)
    err_clear:
            If Err <> 0 Then
              Err.Clear
              Resume Next
            End If
            wb.Quit
                
        Next i
    
    End Sub

    Please Note this is only works on getting text from https://www.instagram.com/leomessi/ and put in column Biography. If you want to get Biography for other website you need individually get it from those other website.

    Best Regards,

    Bruce

    Tuesday, September 11, 2018 6:44 AM
    Moderator
  • Hi dear friend Bruce
    Thanks for your attention to me

    Tuesday, September 11, 2018 7:17 AM
  • Hi 
    I already made the settings
    Macro does not run again

    1 I am an amateur
    2 A bug in my Excel
    3 Your patience is commendable to me

    link file :{http://s8.picofile.com/file/8336955950/bruce_1.xlsm.html}


    • Edited by kapriano Tuesday, September 11, 2018 7:53 AM
    Tuesday, September 11, 2018 7:31 AM
  • Hi kapriano,

    Have you tried with this url (https://www.instagram.com/leomessi/)?

    I’ve tried from my side, and it works fine.

    Please try it again and let me know if you have further questions.

    Best Regards,

    Bruce

    Tuesday, September 11, 2018 10:36 AM
    Moderator
  • Hi  Bruce

    It's as if it's a bug of me
    I will try and send you the result
    Many thanks to the guide

    Tuesday, September 11, 2018 11:10 AM
  • Hi bruce dear

    Are you import  column  A   multiple url ?

    I do not know why it downloads the first URL  But not to lastrow in  column  A
    I even repeated a similar url , but failed . Just download the first url 
    I even removed my Excel and installed it

    Sometimes it downloads up to four urls
    But again, it encounters an error 

    err:
    1.  invalid procedure call or argument
    2.  run-time err  91
    3.  object variable or with block not set


    Is it possible for you?
    Upload a file without a bug to me.  
    and  I will be sure it's a bug of me .

    -Is it possible to different  HTML of each page ?
    -Could it be my browser ? My default is Firefox.



    thanks


    Tuesday, September 11, 2018 1:56 PM
  • Hi kapriano,

    Because the loop starts on the second line, the first URL is downloaded.

    I also appear  your second and third errors. I'm sorry that I failed to solve them, but there is no error if I debug to perform  step by step.

    You can break the point to debug the run, so that there is no error.

    The revised code is as follows:

    Sub test()
    
        Dim wb As Object
        Dim doc As Object
        Dim sURL As String
        Dim lastrow As Long
        Dim n As Integer
        Dim i As Integer
        Dim HtmlToText As String
        Dim result
        lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
        
        For i = 2 To lastrow  'Start the loop on the second row of column A. Until the last URL..
            
            Set wb = CreateObject("internetExplorer.Application")
            sURL = Cells(i, 1)
            wb.navigate sURL
            wb.Visible = False
            While wb.Busy
              DoEvents
            Wend
            'HTML document
            Set doc = wb.document
            Dim Name As Variant
            Dim Posts As Variant
            Dim Followers As Variant
            Dim Following As Variant
            Dim DivValue As Variant
            Dim DivValueSplit As Variant
            Dim DivValueResult As Variant
            Dim Biography As Variant
            
            Name = doc.getElementsByClassName("AC5d8 notranslate")(0).innerText
            Posts = doc.getElementsByClassName("g47SY")(0).innerText
            Followers = doc.getElementsByClassName("g47SY")(1).innerText
            Following = doc.getElementsByClassName("g47SY")(2).innerText
            'dd = web.document.querySelector("div.-vDIg span").innerText
            DivValue = doc.getElementsByClassName("-vDIg")(0).innerText
            
            'DivValueSplit = Split(DivValue, "<br>")
            'If UBound(DivValueSplit) = 2 Then
            '   DivValueResult = DivValueSplit(1) & DivValueSplit(2)
             '  j = InStr(DivValueResult, "</span>")
              ' Biography = Mid(DivValueResult, 7, j - 7)
            'ElseIf sURL = "https://www.instagram.com/philipplein/" Then
             ' DivValueResult = DivValueSplit(0)
              'j = InStr(DivValueResult, "</h1>")
              'Biography = Mid(DivValueResult, 19, j - 5)
            'Else
             '   DivValueResult = DivValueSplit(1)
              '  j = InStr(DivValueResult, "</span>")
               ' Biography = Mid(DivValueResult, 7, j - 7)
            'End If
            
            Worksheets("sheet1").Cells(i, 2) = Name
            Worksheets("sheet1").Cells(i, 3) = Followers
            Worksheets("sheet1").Cells(i, 4) = Following
            Worksheets("sheet1").Cells(i, 5) = Posts
            Worksheets("sheet1").Cells(i, 6) = DivValue
            'Biography = Replace(re1, "<span>", "")
                    
            'Cells(i, 2) = HtmlToText
            ' myarray = Split(Data, vbCrLf)
    err_clear:
            If Err <> 0 Then
              Err.Clear
              Resume Next
            End If
            wb.Quit
                
        Next i
    
    End Sub

    Best Regards,

    Bruce



    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.



    Wednesday, September 12, 2018 10:41 AM
    Moderator
  • Hi Bruce dear

    very thanks

    checking

    Wednesday, September 12, 2018 10:49 AM
  • Hi kapriano,

    Could you please share your Excel (include VBA code) file for testing? However, as Bruce provide solution, it works for me. You can test it again. 

    Also, for the knowledge of VBA, you can try to understand it.

    Best Regards,

    Simon


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.

    Wednesday, September 12, 2018 10:52 AM
    Moderator
  • Hi  simon

    pls . moment 

    thanks

    Wednesday, September 12, 2018 11:08 AM
  • Hi

    I do not know why the macros in the third url stopped with the following error

    ;;;  Name = doc.getElementsByClassName("AC5d8 notranslate")(0).innerText

    Even when the url is repeated

    {http://s9.picofile.com/file/8337073892/bruce_2.xlsm.html}

    thanks

    Wednesday, September 12, 2018 11:15 AM
  • According to Bruce's method, it's just only applies to the Instagram website, because you want to crawl the page information which must be based on the page's HTML tag to obtain the content. For a simple example, a site has this tag, but the B site does not, so you will have getting a problem when you use it. 

    You should be use Instagram website to test it. 


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.


    Wednesday, September 12, 2018 11:23 AM
    Moderator
  • Hi 

    Yes I know
    I just enter the Insta urls
    But sometimes at the first
    url 
    Sometimes the third url stops with an error
    I said, maybe the content of the html page is different
    So I repeated the address 10 times
    Hoping to download everyone
    But not again

    thanks

    Wednesday, September 12, 2018 11:29 AM
  • Could you please share your Excel file to One Drive and give us a link? We will download it to debug the issue. 

    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.

    Wednesday, September 12, 2018 11:33 AM
    Moderator
  • Hi dear

    {http://s9.picofile.com/file/8337073892/bruce_2.xlsm.html}

    Best Regards,



    file excel
    • Edited by kapriano Wednesday, September 12, 2018 11:39 AM
    Wednesday, September 12, 2018 11:36 AM
  • We have downloaded your file and run the button which found a run-time err91 problem as you mentioned. So the problem is that we're running it differently since we've been running code in debug mode before.

    Also, for the error91 issue, we are testing it and will let you know if there is any progress. It may a HTML loading problem and you can refer to this post to know it.

    I noticed that the post mentioned, "It is being caused by attempting to access the document before it's completely loaded. You're just checking ie.ReadyState."

    Issue code : 

    Name = doc.getElementsByClassName("AC5d8 notranslate")(0).innerText


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.

    • Marked as answer by kapriano Wednesday, September 12, 2018 7:59 PM
    • Unmarked as answer by kapriano Wednesday, September 12, 2018 8:12 PM
    • Marked as answer by kapriano Thursday, September 13, 2018 7:32 AM
    Wednesday, September 12, 2018 1:07 PM
    Moderator
  • Hi 

    Dear Friends

    thanks for help. 

                                                                                               

    Sub testbruce3()

        Dim wb As Object
        Dim doc As Object
        Dim sURL As String
        Dim lastRow As Long
        Dim n As Integer
        Dim i As Integer
        Dim HtmlToText As String
        Dim result


        lastRow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row

        For i = 2 To lastRow  'Start the loop on the second row of column A. Until the last URL..

            Set wb = CreateObject("internetExplorer.Application")
            sURL = Cells(i, 1)
            wb.navigate sURL
            wb.Visible = False
            While wb.Busy
              DoEvents
            Wend



            'HTML document
            Set doc = wb.document
            Dim Name As Variant
            Dim Posts As Variant
            Dim Follows As Variant
            Dim Following As Variant
            Dim DivValue As Variant
            Dim DivValueSplit As Variant
            Dim DivValueResult As Variant
            Dim Biography As Variant

                Do
        If wb.readyState = READYSTATE_COMPLETE Then
            If wb.document.readyState = "complete" Then Exit Do
        End If
        Application.Wait DateAdd("s", 1, Now)
    Loop

            Name = doc.getElementsByClassName("AC5d8 notranslate")(0).innerText
            Posts = doc.getElementsByClassName("g47SY")(0).innerText
            Followers = doc.getElementsByClassName("g47SY")(1).innerText
            Following = doc.getElementsByClassName("g47SY")(2).innerText
            'dd = web.document.querySelector("div.-vDIg span").innerText
            DivValue = doc.getElementsByClassName("-vDIg")(0).innerText

            'DivValueSplit = Split(DivValue, "<br>")
            'If UBound(DivValueSplit) = 2 Then
            '   DivValueResult = DivValueSplit(1) & DivValueSplit(2)
             '  j = InStr(DivValueResult, "</span>")
              ' Biography = Mid(DivValueResult, 7, j - 7)
            'ElseIf sURL = "https://www.instagram.com/philipplein/" Then
             ' DivValueResult = DivValueSplit(0)
              'j = InStr(DivValueResult, "</h1>")
              'Biography = Mid(DivValueResult, 19, j - 5)
            'Else
             '   DivValueResult = DivValueSplit(1)
              '  j = InStr(DivValueResult, "</span>")
               ' Biography = Mid(DivValueResult, 7, j - 7)
            'End If

            Worksheets("sheet1").Cells(i, 2) = Name
            Worksheets("sheet1").Cells(i, 3) = Followers
            Worksheets("sheet1").Cells(i, 4) = Following
            Worksheets("sheet1").Cells(i, 5) = Posts
            Worksheets("sheet1").Cells(i, 6) = DivValue
            Biography = Replace(re1, "<span>", "")

            'Cells(i, 2) = HtmlToText
             'myarray = Split(Data, vbCrLf)
    err_clear:
            If Err <> 0 Then
              Err.Clear
              Resume Next
            End If
            wb.Quit

        Next i

    End Sub

    Thursday, September 13, 2018 7:36 AM
  • Thanks for your response. I'm glad to hear that. Well, I think you should streamline your code. For example, removing useless code and variables. Also, I notice that you ask the question on GitHub, and you can actually try another approach. Anyway, it's all about access to web resources, so whatever works best for you. The important thing is that you learn something. 

    So if you have any question, or update, please feel free to let us know.

    Best Regards,

    Simon


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.

    Thursday, September 13, 2018 7:55 AM
    Moderator