none
Selective Web Scraping? RRS feed

  • Question

  • Hi :)

    How can I extract the "Today's Open" value from this website: https://www.coindesk.com/price/ | and place it in a label in my form (VB.NET). I can't even find the element ID for certain within the page's source.

    Imports System.Net
    Imports System.IO
    
    Public Class BTCPrice
    
        Private Sub BTCPrice_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            WebBrowser1.Visible = False
            WebBrowser1.Navigate("https://www.coindesk.com/price/")
            PriceValue.Text = WebBrowser1.Document.GetElementById("data").InnerText 'I know this is the wrong ID, I can't find the ID in HTML source :|
        End Sub
    End Class

    I've tried looking online but it's a bit too complicated for me at this stage. Is my code too simple to even work (with/without correct ID for the price value?)?

    Thanks in advance! :)

    Thursday, October 26, 2017 3:21 PM

Answers

  • Hi :)

    How can I extract the "Today's Open" value from this website: https://www.coindesk.com/price/ | and place it in a label in my form (VB.NET). I can't even find the element ID for certain within the page's source.

    Imports System.Net
    Imports System.IO
    
    Public Class BTCPrice
    
        Private Sub BTCPrice_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            WebBrowser1.Visible = False
            WebBrowser1.Navigate("https://www.coindesk.com/price/")
            PriceValue.Text = WebBrowser1.Document.GetElementById("data").InnerText 'I know this is the wrong ID, I can't find the ID in HTML source :|
        End Sub
    End Class

    I've tried looking online but it's a bit too complicated for me at this stage. Is my code too simple to even work (with/without correct ID for the price value?)?

    Thanks in advance! :)

    Be sure to check to make sure that's not in violation of the site's terms, but assuming not, have a look at HTML Agility Pack:

    https://htmlagilitypack.codeplex.com/

    It's quite powerful for what it does.

    *****

    Also know this: Some sites intentionally rearrange things periodically just to keep people from trying to get their data. You could potentially spend several hours working out just how to get what you want and the next week, it won't work because they've change things around.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Thursday, October 26, 2017 8:59 PM
  •  As Frank said,  check their 'Terms & Conditions' (link below) to see if they allow scrapping of their website or not.  Almost all websites now days consider it as breaking their rules.  If you don't understand if it says you can or not,  there is a link in their terms and conditions that they suggest for you to contact them and ask.  Usually if a site has data that they want you to be able to interact with from your own applications,  they will have an API you can download.

    coindest - "Terms & Conditions"


    If you say it can`t be done then i`ll try it

    Thursday, October 26, 2017 9:14 PM
  • Arrd,

    Let me make this a bit easier on you.

    First, realize that what's being returned by the API that Devon showed is JSON. Probably the most famous third-party (but free) assembly to deal with JSON is from NewtonSoft:

    https://www.newtonsoft.com/json

    JSON is Java-based and is considered an alternate for XML but in reality, it's pretty much overtake XML in many areas.

    To take the data that's returned from CoinDesk and turn it into usable data gets a little involved, but I've put it together for you in a way that it won't be so difficult on your end. I have an assembly (a .dll file) where I've embedded the NewtonSoft assembly along with a method to read the JSON data in. That then is put together into a class which you can much more easily work with.

    First, download my new assembly from here (it's zipped up - but just one file inside):

    https://fls.exavault.com/share/view/k8i5-3e5s507x

    Extract the contents which will be a .dll file named "CoinDesk.dll". In your program, add the reference and then to use it it's nothing more than instantiating it. Creating a new instance of the class gets the latest data and you can then just use the properties directly.

    In the following, please note the Imports statement:

    Imports CoinDesk.Data
    
    Public Class Form1
    
        Private Sub _
            Form1_Load(sender As System.Object, _
                       e As System.EventArgs) _
                       Handles MyBase.Load
    
            Dim ud As New Updater
    
            Stop
    
        End Sub
    End Class

    Not much code but the work is in the .dll file. You'll notice a slight pause when you create a new instance like that because it's downloading the JSON and deserializing it into classes that's inside my assembly. It only takes about a second though.

    You can see what's in there by hovering your mouse over the variable "ud" shown above. When you do you'll see this:

    The time shown for the update is in your time zone. Per currency, if you'll expand those you'll see their properties:

    Experiment with that a bit and I think you'll find it easy to work with but if you have questions, just ask. :)


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Friday, October 27, 2017 4:47 PM

All replies

  • Hi :)

    How can I extract the "Today's Open" value from this website: https://www.coindesk.com/price/ | and place it in a label in my form (VB.NET). I can't even find the element ID for certain within the page's source.

    Imports System.Net
    Imports System.IO
    
    Public Class BTCPrice
    
        Private Sub BTCPrice_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            WebBrowser1.Visible = False
            WebBrowser1.Navigate("https://www.coindesk.com/price/")
            PriceValue.Text = WebBrowser1.Document.GetElementById("data").InnerText 'I know this is the wrong ID, I can't find the ID in HTML source :|
        End Sub
    End Class

    I've tried looking online but it's a bit too complicated for me at this stage. Is my code too simple to even work (with/without correct ID for the price value?)?

    Thanks in advance! :)

    Be sure to check to make sure that's not in violation of the site's terms, but assuming not, have a look at HTML Agility Pack:

    https://htmlagilitypack.codeplex.com/

    It's quite powerful for what it does.

    *****

    Also know this: Some sites intentionally rearrange things periodically just to keep people from trying to get their data. You could potentially spend several hours working out just how to get what you want and the next week, it won't work because they've change things around.


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Thursday, October 26, 2017 8:59 PM
  •  As Frank said,  check their 'Terms & Conditions' (link below) to see if they allow scrapping of their website or not.  Almost all websites now days consider it as breaking their rules.  If you don't understand if it says you can or not,  there is a link in their terms and conditions that they suggest for you to contact them and ask.  Usually if a site has data that they want you to be able to interact with from your own applications,  they will have an API you can download.

    coindest - "Terms & Conditions"


    If you say it can`t be done then i`ll try it

    Thursday, October 26, 2017 9:14 PM
  • @IronRazerz and Frank -

    Thanks for the warning, I'll be sure to review the Terms & Conditions of the site. Afterwards, I'll check out the tool you linked, thanks Frank!

    Thursday, October 26, 2017 9:19 PM
  • @IronRazerz and Frank -

    Thanks for the warning, I'll be sure to review the Terms & Conditions of the site. Afterwards, I'll check out the tool you linked, thanks Frank!

    You're welcome. :)

    *****

    IR,

    Have a look at this when you get a minute (they changed their site information around from CodePlex):

    http://html-agility-pack.net/?z=codeplex

    On the left side there, slowly move your mouse around. It's like interconnecting spiderwebs or something.

    Neat effect however it's done. ;-)


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Thursday, October 26, 2017 9:25 PM
  • Yeah I saw that, real neat. Inspecting the source of that object, you can see it's done in JavaScript. Here's a similar example. Once I've mastered vb.net (which will, at this rate, take another decade or two), I'll move onto JS.

    And this one: http://vincentgarreau.com/particles.js/

    xD

    Thursday, October 26, 2017 9:41 PM
  • IR,

    Have a look at this when you get a minute (they changed their site information around from CodePlex):

    http://html-agility-pack.net/?z=codeplex

    On the left side there, slowly move your mouse around. It's like interconnecting spiderwebs or something.

    Neat effect however it's done. ;-)


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Off Topic....

     Hey Frank,

     Yes,  it is pretty neat.  I would not have a clue how to do it on a webpage but,  in vb.net i'm sure it would not be to difficult.  If Tom sees this,  he will probably be trying it.  8)


    If you say it can`t be done then i`ll try it

    Thursday, October 26, 2017 9:49 PM
  • If Tom sees this,  he will probably be trying it.  8)

    If you say it can`t be done then i`ll try it

    I bet so too. ;-)

    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Thursday, October 26, 2017 9:53 PM
  • Have a look at this when you get a minute (they changed their site information around from CodePlex):

    http://html-agility-pack.net/?z=codeplex

    Arrd,

    I think stock quotes are off limits always. Others own them you cant use the data. In fact there can be big penalties.

    PS Frank, Razerz, I see. :)

    Thursday, October 26, 2017 11:19 PM
  • PS Frank, Razerz, I see. :)

    Food for thought. :)

    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Thursday, October 26, 2017 11:51 PM
  • https://www.coindesk.com/api/
    https://api.coindesk.com/v1/bpi/currentprice.json

    Free API, no key or registration required

    Returns this:

    {"time":{"updated":"Oct 27, 2017 00:20:00 UTC","updatedISO":"2017-10-27T00:20:00+00:00","updateduk":"Oct 27, 2017 at 01:20 BST"},"disclaimer":"This data was produced from the CoinDesk Bitcoin Price Index (USD). Non-USD currency data converted using hourly conversion rate from openexchangerates.org","chartName":"Bitcoin","bpi":{"USD":{"code":"USD","symbol":"$","rate":"5,886.6825","description":"United States Dollar","rate_float":5886.6825},"GBP":{"code":"GBP","symbol":"£","rate":"4,483.5564","description":"British Pound Sterling","rate_float":4483.5564},"EUR":{"code":"EUR","symbol":"€","rate":"5,058.3027","description":"Euro","rate_float":5058.3027}}}

    Friday, October 27, 2017 12:29 AM
  • https://www.coindesk.com/api/
    https://api.coindesk.com/v1/bpi/currentprice.json

    Free API, no key or registration required

    Returns this:

    {"time":{"updated":"Oct 27, 2017 00:20:00 UTC","updatedISO":"2017-10-27T00:20:00+00:00","updateduk":"Oct 27, 2017 at 01:20 BST"},"disclaimer":"This data was produced from the CoinDesk Bitcoin Price Index (USD). Non-USD currency data converted using hourly conversion rate from openexchangerates.org","chartName":"Bitcoin","bpi":{"USD":{"code":"USD","symbol":"$","rate":"5,886.6825","description":"United States Dollar","rate_float":5886.6825},"GBP":{"code":"GBP","symbol":"£","rate":"4,483.5564","description":"British Pound Sterling","rate_float":4483.5564},"EUR":{"code":"EUR","symbol":"€","rate":"5,058.3027","description":"Euro","rate_float":5058.3027}}}


    Good find

    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Friday, October 27, 2017 12:35 AM
  • Thanks for the API Devon. Out of curiosity, I've been trying to extract the current rate from this. The agility Pack works fine, but I'm just trying to experiment with different methods to get better. This is how far I've gotten with it :P

        Private Sub Scrape()
            '2 Textboxes and 1 button - first textbox has the raw scrape retrieved from the API URL, the second is the formatted scrape with just the BTC price
            Dim strURL As String = "https://api.coindesk.com/v1/bpi/currentprice.json"
    
            Dim strOutput As String = ""
    
            Dim wrResponse As WebResponse
            Dim wrRequest As WebRequest = HttpWebRequest.Create(strURL)
    
            TxtRawScrape.Text = "Extracting..." & Environment.NewLine
            wrResponse = wrRequest.GetResponse()
    
            Using sr As New StreamReader(wrResponse.GetResponseStream())
                strOutput = sr.ReadToEnd()
                sr.Close()
            End Using
    
            TxtRawScrape.Text = strOutput
            strOutput = Regex.Replace(strOutput, "**everything before current price**", "") 'I wonder how you'd program that..?
            strOutput = Regex.Replace(strOutput, "**everything after current price**", "") ' ^^
            TxtBTCPrice.Text = strOutput
    
        End Sub
    
        Private Sub BtnExtract_Click(sender As Object, e As EventArgs) Handles BtnExtract.Click 'This is the button that formats the "RawScrape"
            Scrape()
        End Sub

    Of course, temporarily you could copy and paste everything before the price and after the price and replace it with "". However, obviously due to BTC's volatility, it's constantly changing.

    Any input? Thanks in advance :)


    Friday, October 27, 2017 3:46 PM
  • Arrd,

    Let me make this a bit easier on you.

    First, realize that what's being returned by the API that Devon showed is JSON. Probably the most famous third-party (but free) assembly to deal with JSON is from NewtonSoft:

    https://www.newtonsoft.com/json

    JSON is Java-based and is considered an alternate for XML but in reality, it's pretty much overtake XML in many areas.

    To take the data that's returned from CoinDesk and turn it into usable data gets a little involved, but I've put it together for you in a way that it won't be so difficult on your end. I have an assembly (a .dll file) where I've embedded the NewtonSoft assembly along with a method to read the JSON data in. That then is put together into a class which you can much more easily work with.

    First, download my new assembly from here (it's zipped up - but just one file inside):

    https://fls.exavault.com/share/view/k8i5-3e5s507x

    Extract the contents which will be a .dll file named "CoinDesk.dll". In your program, add the reference and then to use it it's nothing more than instantiating it. Creating a new instance of the class gets the latest data and you can then just use the properties directly.

    In the following, please note the Imports statement:

    Imports CoinDesk.Data
    
    Public Class Form1
    
        Private Sub _
            Form1_Load(sender As System.Object, _
                       e As System.EventArgs) _
                       Handles MyBase.Load
    
            Dim ud As New Updater
    
            Stop
    
        End Sub
    End Class

    Not much code but the work is in the .dll file. You'll notice a slight pause when you create a new instance like that because it's downloading the JSON and deserializing it into classes that's inside my assembly. It only takes about a second though.

    You can see what's in there by hovering your mouse over the variable "ud" shown above. When you do you'll see this:

    The time shown for the update is in your time zone. Per currency, if you'll expand those you'll see their properties:

    Experiment with that a bit and I think you'll find it easy to work with but if you have questions, just ask. :)


    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Friday, October 27, 2017 4:47 PM
  • Arrd - I defer to Franks submission, all the work is done for you

    Saturday, October 28, 2017 6:47 AM