none
Trouble parsing image links from craigslist RRS feed

  • Question

  • Hi there! Can't make my code work to fetch image links from the below elements taken from Craigslist. I tried like this but it's not what the code should be. Any help would be highly appreciated.

    Set topics = html.getElementsByClassName("result-image gallery")
    For Each topic In topics
        Cells(x, 1) = topic.getElementsByTagName("img")(0).src
        x = x + 1
    Next topic

    Here is the element for that:

    <a href="/mnh/atq/6033903864.html" class="result-image gallery" data-ids="1:00l0l_auIVAPKuweh"><img alt="" class="" src="https://images.craigslist.org/00l0l_auIVAPKuweh_300x300.jpg">
    <span class="result-price">$120</span>
    </a>



    • Edited by ShahinIqbal Monday, March 13, 2017 2:43 PM correction
    Wednesday, March 8, 2017 10:31 PM

Answers

  • Finally solved it using selenium in combination with vba.

    Sub CraigslistImage()
    Dim driver As SeleniumWrapper.WebDriver
    Dim posts As Object, post As Object
    Set driver = New SeleniumWrapper.WebDriver
    driver.Start "Phantomjs", "https://newyork.craigslist.org/search"
    driver.get "/ata"
    Set posts = driver.findElementsByClassName("swipe-wrap")
    For Each post In posts    
    i = i + 1    
    Cells(i, 1) = post.findElementByTagName("img").getAttribute("src")
    Next post
    Set driver = Nothing: Set posts = Nothing
    End Sub

    • Marked as answer by ShahinIqbal Saturday, May 20, 2017 6:48 PM
    Monday, April 3, 2017 9:43 PM

All replies

  • What I have done in the past is extract the src link (https://images.craigslist.org/00l0l_auIVAPKuweh_300x300.jpg) and used wget (free command line tool) to get image.  You can call wget from VBA.  Another option is to open the image url in another browser window and the save the browser tab using cntl+s.  I've done this using Selenium.  You can use AutoIt (free) to fill in location and click Save.

    wget --user=username --ask-password https://images.craigslist.org/00l0l_auIVAPKuweh_300x300.jpg

    You may need to use curl rather than wget due to HTTP compliance. 

    curl -o <filename-to-save-as> -u <username>:<password> <url>

    • Edited by mogulman52 Thursday, March 9, 2017 2:01 PM
    Thursday, March 9, 2017 1:51 PM
  • Edit: Ain't it possible to parse image links using REGEX in this case?



    • Edited by ShahinIqbal Monday, March 13, 2017 2:06 PM correction
    Saturday, March 11, 2017 6:59 PM
  • Are you asking if you can extract the link src property from the HTML using regex?  Yes you can.  You can't extract the image itself using regex. 
    Sunday, March 12, 2017 9:04 PM
  • Thanks dear mogulman52 for your reply. I'm terribly sorry to say that I was not explicit with my asking. It was never meant to scrape images rather it was the link. Thanks.
    Monday, March 13, 2017 1:46 PM
  • Finally solved it using selenium in combination with vba.

    Sub CraigslistImage()
    Dim driver As SeleniumWrapper.WebDriver
    Dim posts As Object, post As Object
    Set driver = New SeleniumWrapper.WebDriver
    driver.Start "Phantomjs", "https://newyork.craigslist.org/search"
    driver.get "/ata"
    Set posts = driver.findElementsByClassName("swipe-wrap")
    For Each post In posts    
    i = i + 1    
    Cells(i, 1) = post.findElementByTagName("img").getAttribute("src")
    Next post
    Set driver = Nothing: Set posts = Nothing
    End Sub

    • Marked as answer by ShahinIqbal Saturday, May 20, 2017 6:48 PM
    Monday, April 3, 2017 9:43 PM
  • SeleniumWrapper is obsolete.  I'm surprised it worked.  Maybe PhantomJS is ok.  I use SeleniumBasic.  I think the last update was a year ago.  You can download and update drivers for Chrome and Edge.  IE works.  If you use Firefox 46.0.1 it works. 
    Monday, April 3, 2017 11:35 PM