locked
goal of making a regex to an http request : aspx - c# RRS feed

  • Question

  • User-458598543 posted

    Hello,

    I want to make a regex applied on an http request but i've got two error messages.

    the code is the following :

    protected void page_load (object sender, EventArgs e)
    {
    bt_dns_requeete.Click += new EventHandler(click_bt_dns_requeete);
    }

    static async void click_bt_dns_requeete(object sender, EventArgs e)
    {
    Regex rgx = new Regex(@"^[a-z0-9-_]*.com$");
    var client = new HttpClient();
    var result = client.GetAsync("http://" + rgx.ToString());      ----------------->    the requested uri is not valid.
    ta_dns_requeete_reesultat.Value = result.StatusCode + "//" + rgx.ToString();     ----------------->    CS0120: Une référence d'objet est requise pour la propriété, la méthode ou le champ non statique 'jicixipi.WF_dns_requeete.ta_dns_requeete_reesultat'.

    }

    could you bring a correction to this code. thanks you for your contribution.

    M.A.

    Friday, March 12, 2021 11:11 AM

Answers

  • User475983607 posted

    it should be possible to make a http request not on the url adress but on one or several key words or on a html tag as <title> or <meta name="keywords">. could you show me a code that does that.

    As written your requirement is not possible due to how DNS works.   The domain or IP is always required when making an HTTP request otherwise there is no way to get to the web server.  This is a fundamental concept. 

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Friday, March 12, 2021 12:40 PM

All replies

  • User753101303 posted

    Hi,

    The problem is that it doesn't make sense. You are doing a query to the http://^[a-z0-9-_]*.com$ web site ?

    You rather want to try to find this pattern inside a page rreturned by a web site or from your past thread you still expect to be able to query the whole internet without having a site list ???

    Edit: for the other problem your methid is static (ie belong to the class rather than to each object instance) but it tries to use an object member. I doubt you have the knownledge needed to build a web search engine but if you want at least to attemp building a crawler the idea would be to have a list of pages (even maye one) and then fetch this page and use what is found in <a href="value"> to find other pages (possibly on other sites) which gives new pages to process and so on... Is this what you are trying to do ?

    See https://en.wikipedia.org/wiki/Web_crawler

    Friday, March 12, 2021 11:38 AM
  • User-458598543 posted

    hello,

    it should be possible to make a http request not on the url adress but on one or several key words or on a html tag as <title> or <meta name="keywords">. could you show me a code that does that.

    thanks you.

    M.A.

    Friday, March 12, 2021 12:36 PM
  • User475983607 posted

    it should be possible to make a http request not on the url adress but on one or several key words or on a html tag as <title> or <meta name="keywords">. could you show me a code that does that.

    As written your requirement is not possible due to how DNS works.   The domain or IP is always required when making an HTTP request otherwise there is no way to get to the web server.  This is a fundamental concept. 

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Friday, March 12, 2021 12:40 PM
  • User-458598543 posted

    come again with the regex approach, could i ask or have you got a method to ask a dns server to obtain a list of sites by hazard ?

    Friday, March 12, 2021 12:55 PM
  • User475983607 posted

    Rednuts72

    come again with the regex approach, could i ask or have you got a method to ask a dns server to obtain a list of sites by hazard ?

    I'm not sure what you mean by using the word "hazard".

    How are you currently asking a DNS server for a list of sites?   Do you have a list of string and are you asking how to search the list using a regular expression?

    Friday, March 12, 2021 1:58 PM
  • User753101303 posted

    Rednuts72

    t should be possible to make a http request not on the url adress but on one or several key words or on a html tag as <title> or <meta name="keywords">. could you show me a code that does that.

    This is not how the web works. You are doing a query against a GIVEN web server and then you can analyze the returned content to see what is found in ttile or meta tags.

    Once again a search engine doesn't do in "real time". It is crawling through tons of web pages and stores the result for this analysis in a huge database so that when someone ask for a site having a given keyword in its title it can use this huge database rather than querying in "real time" for all sites found on the internet.

    I see for example a research paper working on 24 000 000 pages and  they use a 100 Gb (page content is compressed)db. If you start from 0 and process a page in one second it would take more than six months to process them all. I previously provided source for list of sites you could use if you really want to test your ideas but I doubt it's worh unless maybe as an exercise.

    Now if you want to do this kind of search google does provide a way to do that using intitle:"we are" rarher than just "we are"

    Edit: I see that Google could run 900 000 servers (though not all for their search service). Not sure how old you are and I don't want to discourage your interest for computing but:
    - use or study maybe on open source search engine
    - meanwhile familiarize yourself with how the web works and programming
    - make sure you have an idea interesting enough even just for you (indexing title and metadata is done already by all search engines)
    - start simple, for example as pointed previously do a "proof of concept" on a much smaller subset to test your ideas at a small scale

    Hope this bad experience won't discourage you about learning coding ;-)

    Friday, March 12, 2021 1:59 PM
  • User-458598543 posted

    thanks you for your contribution.

    M.A.

    Friday, March 12, 2021 3:30 PM