none
google search

    Question

  • I am trying to write a program that extracts the web addrresses from a google search result.

    It seems that I just don't know how to find the right sintax to extract the links.

    I don't want to use the google api because the searches are limited to 100 per day so I decided to write my own but it just doesen't want to work.

    I thing that the problem is when I try to analyze the specific tag.

    Here is the code

    using System;

    using System.Collections.Generic;

    using System.Text;

    using System.Net;

    using System.IO;

    using WebCrawler.Spider.HTML;

    namespace Google_Search

    {

    class ExtractSubPage

    {

    /// <summary>

    /// This method downloads the specified URL into a C#

    /// String.

    /// </summary>

    /// <param name="url">The URL to download.</param>

    /// <returns>The contents of the URL that was downloaded.</returns>

    public String DownloadPage(Uri url)

    {

    WebRequest http = HttpWebRequest.Create(url);

    HttpWebResponse response = (HttpWebResponse)http.GetResponse();

    StreamReader stream = new StreamReader(response.GetResponseStream(), System.Text.Encoding.ASCII);

    String result = stream.ReadToEnd();

    response.Close();

    stream.Close();

    return result;

    }

    /// <summary>

    /// This method is very useful for grabbing information from a

    /// HTML page. It extracts text from between two tokens, the

    /// tokens need not be case sensitive.

    /// </summary>

    /// <param name="str">The string to extract from.</param>

    /// <param name="token1">The text, or tag, that comes before the desired text</param>

    /// <param name="token2">The text, or tag, that comes after the desired text</param>

    /// <param name="count">Which occurrence of token1 to use, 1 for the first</param>

    /// <returns></returns>

    public String ExtractNoCase(String str, String token1, String token2,

    int count)

    {

    int location1, location2;

    // convert everything to lower case

    String searchStr = str.ToLower();

    token1 = token1.ToLower();

    token2 = token2.ToLower();

    // now search

    location1 = location2 = 0;

    do

    {

    location1 = searchStr.IndexOf(token1, location1 + 1);

    if (location1 == -1)

    return null;

    count--;

    } while (count > 0);

    // return the result from the original string that has mixed

    // case

    location1 += token1.Length;

    location2 = str.IndexOf(token2, location1 + 1);

    if (location2 == -1)

    return null;

    return str.Substring(location1, location2 - location1);

    }

    /// <summary>

    /// Process each subpage. The subpages are where the data actually is.

    /// </summary>

    /// <param name="u">The URL of the subpage.</param>

    private void ProcessSubPage(Uri u)

    {

    String str = DownloadPage(u);

    String site = ExtractNoCase(str, "<b></td><td><a href=\"",

    "\"", 0);

    StringBuilder buffer = new StringBuilder();

    }

    public void Process(Uri url)

    {

    String value = "";

    WebRequest http = HttpWebRequest.Create(url);

    HttpWebResponse response = (HttpWebResponse)http.GetResponse();

    Stream istream = response.GetResponseStream();

    ParseHTML parse = new ParseHTML(istream);

    int ch;

    while ((ch = parse.Read()) != -1)

    {

    if (ch == 0)

    {

    HTMLTag tag = parse.Tag;

    if (String.Compare(tag.Name, "a", true) == 0)

    {

    value = tag["href"];

    Uri u = new Uri(url, value.ToString());

    value = u.ToString();

    ProcessSubPage(u);

    }

    }

    }

    }

    static void Main(string[] args)

    {

    Uri u = new Uri("http://www.google.com/#hl=en&sugexp=esqb%2Cratio%3D0%2Cdepth%3D0&cp=4&gs_id=g&xhr=t&q=adventure&qe=YWR2ZQ&qesig=NacInZOSabnoiRry0cJRiQ&pkc=AFgZ2tmOdiwaXSmX4tUAZTRp36-wRHiXJCREq-Jw_werLMPs9O3-3DANVDh5DKckz1dlrdWVaxhlwLmEsJ0A_DA04AgfrJMvvQ&pf=p&sclient=psy&site=&source=hp&pbx=1&oq=adve&aq=0p&aqi=p-p1g4&aql=t&gs_sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.&fp=6a466a86c28e7859&biw=1600&bih=799");

    ExtractSubPage parse = new ExtractSubPage();

    parse.Process(u);

    }

    }

    }

     

    Sunday, July 17, 2011 5:25 PM

Answers

All replies

  • I always remind people that this violates Google's terms of service.

    5.3 You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google. You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) and shall ensure that you comply with the instructions set out in any robots.txt file present on the Services.

    Please reconsider using one of Google's APIs.  Such as the Custom Search API.

    A friend of mine put it like this (although it was originally in reference to fast forwarding recorded television shows):  If you don't look at the ads, you're stealing the service.

    Consider complaining to Google about their rates.

    • Edited by Wyck Sunday, July 17, 2011 5:51 PM
    Sunday, July 17, 2011 5:42 PM
  • I just can't figure out how to insert the custom search API in my code

     


    • Edited by Eduard77 Sunday, July 17, 2011 5:51 PM
    Sunday, July 17, 2011 5:50 PM
  • And of course that I didn't read the  Google's terms of service. :)
    Sunday, July 17, 2011 5:51 PM
  • Possible to switch to bing? I don't know if it really helps, but maybe

    http://www.codeproject.com/KB/IP/BingAPI.aspx


    No pressure, no diamonds.
    Sunday, July 17, 2011 5:54 PM
  • I have looked at the exemple and I tried to use it but it dosen't work

    I added the web reference and everything but the program tells me that it cannot find the path

    Error 1 The type or namespace name 'MyBingService' does not exist in the namespace 'WindowsFormsApplication1' (are you missing an assembly reference?) D:\program porci\exercitii\bots\cauta google\cauta google\Form1.cs 4 32 cauta google
    I copied the code exactly how it was there.

     

    Sunday, July 17, 2011 6:26 PM
  • remove the unsing namespace directive "MyBingService" and use the right one for your imported webservice

    but sorry, I have no experience with the bing api, it was just an idea and I don't know if it helps


    No pressure, no diamonds.
    Sunday, July 17, 2011 6:34 PM
  • I changed the using directives like this using

    System;

    using

     

    System.Windows.Forms;

    using

     

    WindowsFormsApplication1;

    using

     

    cauta_google.MyBingService;

    and it seems to work but I receive now another error

    Error 1 'cauta_google.Form1.Dispose(bool)': no suitable method found to override D:\program porci\exercitii\bots\cauta google\cauta google\Form1.Designer.cs 14 33 cauta google

    Sunday, July 17, 2011 6:39 PM
  • Sorry, may be I will try this on my own later, but it seems this is no error caused by the bing code.


    No pressure, no diamonds.
    Sunday, July 17, 2011 6:49 PM
  • I start now also to read the bing documentation
    • Marked as answer by Eduard77 Monday, July 18, 2011 6:43 PM
    Sunday, July 17, 2011 6:55 PM