locked
Fetching HTML data to Windows Apps

    Question

  • Hi,

    I have this wild idea..

    Is it possible to fetch data from a general website which is using static id's and classes to the app.

    Like, let's say I go on bing. There is a menu at the top with strings such as Web, images etc. At the bottom you can see microsoft copyright.

    Let's say my app goes on that page, scans the html etc. And transfers the information in that div or list.

    Is this possible, if so.. How?

    Thanks!

    -Vincent

    Tuesday, December 16, 2014 7:19 PM

Answers

  • Hi Vincent,

    Here I can only provide a really easy sample for you to get the bottom line:

                HttpClient httpclient = new HttpClient();
                var result = await httpclient.GetAsync(new Uri("http://www.bing.com"));
                var HTMLContent = result.Content.ToString();
                
                
                List<string> list = new List<string>();
                MatchCollection mc = Regex.Matches(HTMLContent, "<ul id=\"sw_footL\">.*</ul>");
                foreach (var word in mc)
                {
                    list.Add(word.ToString());
                }

    However using regular expression always a high workload, if you would like cutoff your workload, try some third party tools for help with analysis HTML code http://htmlagilitypack.codeplex.com/

    --James


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    • Marked as answer by Vincent Gio Wednesday, December 17, 2014 11:16 AM
    Wednesday, December 17, 2014 6:53 AM
    Moderator

All replies

  • Hi Vincent,

    By using HTTPClient class we can fetch the whole HTML page for the bing, for instance we can do something like this:

                HttpClient httpclient = new HttpClient();
                var result = await httpclient.GetAsync(new Uri("http://www.bing.com"));
                var HTMLContent = result.Content;

    Try to use something like regular expression to filter the div in the HTML code.

    --James


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, December 17, 2014 1:19 AM
    Moderator
  • Hi James,

    I am a newbie with such things, would you be kind to show a sample?

    -Vincent

    Wednesday, December 17, 2014 5:53 AM
  • Hi Vincent,

    Here I can only provide a really easy sample for you to get the bottom line:

                HttpClient httpclient = new HttpClient();
                var result = await httpclient.GetAsync(new Uri("http://www.bing.com"));
                var HTMLContent = result.Content.ToString();
                
                
                List<string> list = new List<string>();
                MatchCollection mc = Regex.Matches(HTMLContent, "<ul id=\"sw_footL\">.*</ul>");
                foreach (var word in mc)
                {
                    list.Add(word.ToString());
                }

    However using regular expression always a high workload, if you would like cutoff your workload, try some third party tools for help with analysis HTML code http://htmlagilitypack.codeplex.com/

    --James


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    • Marked as answer by Vincent Gio Wednesday, December 17, 2014 11:16 AM
    Wednesday, December 17, 2014 6:53 AM
    Moderator
  • Aha, thanks.  I will look into this and hopefully get it to work in some logic way!

    Just a quick one, the items would be added into the list step-wise all the time right? So it's not the First, third, second etc. But First, Second, Third.

    -Vincent

    Wednesday, December 17, 2014 11:16 AM