Best way to parse through HTTP reponse RRS feed

  • Question

  • User-1851254046 posted

    I am capturing the response from an HTTP post and getting back a whole big bunch of HTML.  That HTML contains a listing of lots of products and associated info, like quantity, price, vendor, etc.  I need to pull out this data and ultimately get it into some kind of data structure.   Is there anything more efficient for me to do other than write something that parses the HTML as one long string, where I have to start at the beginning of the string and move along looking for certain values and pulling out data as substrings?




    Monday, February 13, 2012 10:48 AM


All replies

  • User-341994687 posted

    It sounds like you are screenscraping. You could look for repeated datastructures in the html and then parse those sections. I must admit I use screenscraper. There is a free version which could be handy for you if you are not using this to an enterprise level.



    Monday, February 13, 2012 11:02 AM
  • User1105131773 posted

    You could take a look at the HTML agility pack


    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Monday, February 13, 2012 11:04 AM
  • User-1851254046 posted

    Yes Seamus, scraping I am.  I do see repeated HTML structures, which will make it perhaps a little easier.  Thanks

    Monday, February 13, 2012 11:13 AM
  • User-1851254046 posted

    Thanks Simon, I will download and see how it goes, this sounds like what I need, thanks for the tip.

    Monday, February 13, 2012 11:15 AM