none
How to search and download something in web using vb.net?

Answers

  • HF,

    I’ve been giving more thought about this. Joel has an excellent idea if you want to return HTML (for example if you’re going to build a program that has a webbrowser or other web-aware renderer), but I assumed all along that you wanted something which you had more control over in terms of how the user enters and sees the data. The following is based on that assumption.

    Prologue

    I said yesterday that I found something about Faroo which, I admit, I’d never heard of prior to then. Faroo uses P2P which I think is an interesting concept for a search engine. Moreover, their business model is quite unique. From their FAQ section on their site:

    Our goal is not the biggest index, hoarding all the spam & irrelevant pages. Our goal is to return the most relevant pages for every query, from the most compact index possible. The more carefully you select already at crawling & index time, the less you have to index & store, and the faster you are at search time.

    We are crawling highly relevant pages first (focused crawling): The web is huge, but most of it is spam and irrelevant content. Traditional search engines filter relevant content by ranking it at search time. We brought this step forward to the crawling and index time. So while indexing less pages, they are more relevant. This allows indexing more relevant results in a shorter time. This should be considered when comparing 2 billion pages (Faroo) to 40 billion pages (Google).

    The idea is an interesting one and with that, I’ve put together a little program that uses Faroo to fetch what it will from what the user has entered and I’ll describe the process herein.

    Program Overview

    The TOS for use has to be abided by and that’s the basis of the design on this program. Their terms are quite simple:

    The rate limit is 1 million queries/month, with not more than 1 query/second peaks.

    The “one query per second” limit is ensured using Threading.Thread.Sleep(ms) and all of that is performed on another thread via a Backgroundworker in this program. Their API is explained here and I chose to have the data returned as well-formed XML as explained here.

    The idea of my program is this: If you look at an example return data in XML format here, note two distinct sections. One is “results” (one of one per query) and the other is a collection of “result”. In the “results” section, take note of the quantity of ‘hits’ that the query returns shown in the value of the tag called “count”.

    The way that I set this program up, I make two calls: The first one asks for a return result of one “result” because the purpose is to look at “results” and find that count. With that, I then make another call to it giving this second query that number of results to return.

    So to clarify here, the first call just tells me how many there are (100 maximum as explained here), and the second call actually gets the return data of the search.

    Program Layout

    The layout is fairly simple as shown here:

    The user types in what to search for in the TextBox shown and they then select which of the two types of searches to perform (web or news). They then click the Search button and the search begins and then displays the results in a DataGridView when complete.

    If you’ll notice in that last screenshot, there’s a dotted border around everything in the bottom of the form. That’s a Panel and I put all those controls in there so that I could programmatically set the panel to be either visible or not, depending on what the user does.

    As soon as they change the text in the TextBox or click one of the RadoButtons, the panel is set to invisible. It remains invisible until after the search is done then it’s set to visible again depending on what happened: Specifically, if they cancel the search midway or if an error is returned, then the visibility is left as False.

    If they cancel it, then the text of the label in the StatusBar will show “Cancelled” and if there’s an error, a MessageBox displays the error. If neither of those happen though, the results are shown in the DGV and when they click on a row, the lower portion (of the panel) will display the overview text of that one (if one is returned), an image (if one is returned), and the Go To Website In My Default Browser button is enabled (if a URL is returned).

    Program In Action

    The following screenshots show how the program works “in action”:

    When they click the Search button as shown above, that button’s text will change to show Cancel as shown below:

    You’ll also notice that the ProgressBar in the StatusBar is shown. On each of the two steps described earlier, it updates (i.e., 50%, then 100%).

    It will then show the data in the DGV:

    You’ll also notice that the topmost row has been programmatically selected thus firing the event which then displays the image, the overview text, and enables the button to go to the selected URL when clicked.

    However, there are times when only the header (Title) is returned as shown below:

    When that happens, the PictureBox will display the Faroo logo, the overview text displays an empty string, and the button to go to the URL is disabled.

    Just to show that it works, let’s now check the news for this topic:

    Program Code

    I have the program on a page of my website here and if you or anyone is interested, I’ll zip up the project folder and upload it for you to download.

    The program isn’t documented at all; if you or anyone wants me to then I will, but I don’t know if you have any interest in this so I’ll just leave it as an open offer.

    I hope you or others who may later find this thread will find this useful.  :)


    Please call me Frank :)

    Friday, November 30, 2012 12:33 AM

All replies

  • How to search and download something in web using vb.net?

    Thanks a lot.

    "What is the meaning of life? Please give me the code in VB"

    ;-)

    *****

    Actually my thoughts went to looking for an API that would do the work and some of the results were surprising. Bing for example - dang what they charge for it.

    Here's an article that I found on it with interesting information. If you'll look at the bottom of that, someone just this month showed a link to a search engine I've never heard of but it looks promising for sure. As an example, here's an XML output from one of their examples shown. I'd definitely be looking at that one, at least to get started.

    I hope it helps. :)


    Please call me Frank :)

    Wednesday, November 28, 2012 9:12 PM
  • the google search engine puts the search word into the URL string.  So you can open a webpage a naviage to the URL

    http://www.google.com/search?q=microsoft+forums&rls=com.microsoft:en-us&ie=UTF-8&oe=UTF-8&startIndex=&startPage=1

    The above search was for the words : "Microsoft" & "Forums"


    jdweng

    • Proposed as answer by Frank L. Smith Friday, November 30, 2012 12:34 AM
    Wednesday, November 28, 2012 10:24 PM
  • HF,

    I’ve been giving more thought about this. Joel has an excellent idea if you want to return HTML (for example if you’re going to build a program that has a webbrowser or other web-aware renderer), but I assumed all along that you wanted something which you had more control over in terms of how the user enters and sees the data. The following is based on that assumption.

    Prologue

    I said yesterday that I found something about Faroo which, I admit, I’d never heard of prior to then. Faroo uses P2P which I think is an interesting concept for a search engine. Moreover, their business model is quite unique. From their FAQ section on their site:

    Our goal is not the biggest index, hoarding all the spam & irrelevant pages. Our goal is to return the most relevant pages for every query, from the most compact index possible. The more carefully you select already at crawling & index time, the less you have to index & store, and the faster you are at search time.

    We are crawling highly relevant pages first (focused crawling): The web is huge, but most of it is spam and irrelevant content. Traditional search engines filter relevant content by ranking it at search time. We brought this step forward to the crawling and index time. So while indexing less pages, they are more relevant. This allows indexing more relevant results in a shorter time. This should be considered when comparing 2 billion pages (Faroo) to 40 billion pages (Google).

    The idea is an interesting one and with that, I’ve put together a little program that uses Faroo to fetch what it will from what the user has entered and I’ll describe the process herein.

    Program Overview

    The TOS for use has to be abided by and that’s the basis of the design on this program. Their terms are quite simple:

    The rate limit is 1 million queries/month, with not more than 1 query/second peaks.

    The “one query per second” limit is ensured using Threading.Thread.Sleep(ms) and all of that is performed on another thread via a Backgroundworker in this program. Their API is explained here and I chose to have the data returned as well-formed XML as explained here.

    The idea of my program is this: If you look at an example return data in XML format here, note two distinct sections. One is “results” (one of one per query) and the other is a collection of “result”. In the “results” section, take note of the quantity of ‘hits’ that the query returns shown in the value of the tag called “count”.

    The way that I set this program up, I make two calls: The first one asks for a return result of one “result” because the purpose is to look at “results” and find that count. With that, I then make another call to it giving this second query that number of results to return.

    So to clarify here, the first call just tells me how many there are (100 maximum as explained here), and the second call actually gets the return data of the search.

    Program Layout

    The layout is fairly simple as shown here:

    The user types in what to search for in the TextBox shown and they then select which of the two types of searches to perform (web or news). They then click the Search button and the search begins and then displays the results in a DataGridView when complete.

    If you’ll notice in that last screenshot, there’s a dotted border around everything in the bottom of the form. That’s a Panel and I put all those controls in there so that I could programmatically set the panel to be either visible or not, depending on what the user does.

    As soon as they change the text in the TextBox or click one of the RadoButtons, the panel is set to invisible. It remains invisible until after the search is done then it’s set to visible again depending on what happened: Specifically, if they cancel the search midway or if an error is returned, then the visibility is left as False.

    If they cancel it, then the text of the label in the StatusBar will show “Cancelled” and if there’s an error, a MessageBox displays the error. If neither of those happen though, the results are shown in the DGV and when they click on a row, the lower portion (of the panel) will display the overview text of that one (if one is returned), an image (if one is returned), and the Go To Website In My Default Browser button is enabled (if a URL is returned).

    Program In Action

    The following screenshots show how the program works “in action”:

    When they click the Search button as shown above, that button’s text will change to show Cancel as shown below:

    You’ll also notice that the ProgressBar in the StatusBar is shown. On each of the two steps described earlier, it updates (i.e., 50%, then 100%).

    It will then show the data in the DGV:

    You’ll also notice that the topmost row has been programmatically selected thus firing the event which then displays the image, the overview text, and enables the button to go to the selected URL when clicked.

    However, there are times when only the header (Title) is returned as shown below:

    When that happens, the PictureBox will display the Faroo logo, the overview text displays an empty string, and the button to go to the URL is disabled.

    Just to show that it works, let’s now check the news for this topic:

    Program Code

    I have the program on a page of my website here and if you or anyone is interested, I’ll zip up the project folder and upload it for you to download.

    The program isn’t documented at all; if you or anyone wants me to then I will, but I don’t know if you have any interest in this so I’ll just leave it as an open offer.

    I hope you or others who may later find this thread will find this useful.  :)


    Please call me Frank :)

    Friday, November 30, 2012 12:33 AM
  • I would love to see this project, thanks Frank!
    Friday, November 30, 2012 4:16 PM
  • I would love to see this project, thanks Frank!

    I have it zipped up and uploaded:

    https://swft.exavault.com/share/view/fmu-cgr7ff6d#/%2F11-30-12

    You'll see the file there and if you click it you'll have an option to download it.

    *****

    I spoke to the CEO of the company by e-mail (he helped me work through forming the correct URL strings initially) and if you'll have a look near the bottom of this page, you'll even see that he's showing a link to this thread and a link to the code on my site.

    :)


    Please call me Frank :)

    Friday, November 30, 2012 4:29 PM
  • Thanks Frank!!!
    Friday, November 30, 2012 4:44 PM
  • Thanks Frank!!!

    I hope you find it helpful.


    Please call me Frank :)

    Friday, November 30, 2012 4:47 PM