locked
problem of loading a list of sites on http public server : c# - aspx RRS feed

  • Question

  • User-458598543 posted

    good night or good day,

    I ask this question in webform forum and a contributor said to me that I should choice an other place of forum that my question is about http request, I come again to you about my problem of loading a list of internet sites (dns or url). I know that it should be possible to ask a public server of internet network (3WC) with a http requested to obtain a list of dns or url and it is my work to realize a function that select a certain number of sites by the type of request which asked for example on the <title> page. a contributor gave a request address http://www.w3.org/help/search?q=web+socket&search-submit but this request is wrong for me. is there others public server requests ?

    thanks you for your participate.

    M.A.

    Thursday, March 4, 2021 6:38 AM

Answers

  • User475983607 posted

    Your question does not make sense.

    I know that it should be possible to ask a public server of internet network (3WC) with a http requested to obtain a list of dns or url

    DNS translates a URL to an IP address.  What is a list of DNS in your application?  Do you want the IP addresses for a URL?

    nslookup www.w3.org

    What is 3WC?  Do you mean W3C?  W3C has a site map where you can get the URLs.

    https://www.w3.org/Consortium/siteindex

    a contributor gave a request address http://www.w3.org/help/search?q=web+socket&search-submit but this request is wrong for me. is there others public server requests ?

    What is wrong?  This is a support forum you must explain the expected results and the actual results.  

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 4, 2021 11:51 AM
  • User753101303 posted

    Hi,

    Be as clear as possible. I first thought you wanted to crawl an existing list of sites. Do you mean that you want to find sites having something in their title tag? Just explain in plain English (you though the W3C may have a list of all existing web sites on the internet ?)

    If yes you can try the "intitle:" preifx. See http://www.googleguide.com/advanced_operators_reference.html#intitle 

    Given your other thread,n having high goals is fine but you likely far underestimate the complexity (and resources if this is not just for few sites but intended for general sites) of creating your own better search engine from zero (wiithout even talking about having others using it ???)

    Edit; Google also have site: to restrict a search to a given site or domain. Or you mean that you want to find all server names in the w3.org domain ??

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 4, 2021 1:24 PM
  • User753101303 posted

    And you want a lisrt for sites found in the w3.org domain or you want to scan the whole web ?  According to https://siteefy.com/how-many-websites-are-there/ you have at least 1,000,000,000 web sites on intenet ?!!

    A search engine is finding sites by using links found in earlier analyzed sites (Google does have DNS servers, I suspect they offer this service so that they can also "discover" new domains). Also a webmabster can add his own site by hand to make sure it is found. Then all this content is preprocessed, stored etc... so that it can be used to return a search response without scanning sites in "real time". See perhaps https://moz.com/beginners-guide-to-seo/how-search-engines-operate

    If you want to find sites from the general internet to me a realistic solution is to use an existing API such as https://www.microsoft.com/en-us/bing/apis/bing-web-search-api. it seems Google doesn't have an official API at this time ???

    Edit: accordiing to another source it can take between 4 days and 4 weeks to have Google to find a new website. You expect to have how manu sites in the list you are looking for ?

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 4, 2021 4:51 PM
  • User753101303 posted

    Hummm... Try perhaps https://whoisds.com/newly-registered-domains which provide a daily list of new domains for free (around 100 000 per day). Seems you have a full list at https://whoisdatacenter.com/whois-database/ for quite a price but you have also free samples by country at the end of this page.

    It should be more then enough to give a try at whatever you want to do and see if you are still within your time and resource constraints. As pointed earlier a search engine can also discover new sites by extracing links from already processed sites...

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 4, 2021 7:17 PM

All replies

  • User475983607 posted

    Your question does not make sense.

    I know that it should be possible to ask a public server of internet network (3WC) with a http requested to obtain a list of dns or url

    DNS translates a URL to an IP address.  What is a list of DNS in your application?  Do you want the IP addresses for a URL?

    nslookup www.w3.org

    What is 3WC?  Do you mean W3C?  W3C has a site map where you can get the URLs.

    https://www.w3.org/Consortium/siteindex

    a contributor gave a request address http://www.w3.org/help/search?q=web+socket&search-submit but this request is wrong for me. is there others public server requests ?

    What is wrong?  This is a support forum you must explain the expected results and the actual results.  

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 4, 2021 11:51 AM
  • User753101303 posted

    Hi,

    Be as clear as possible. I first thought you wanted to crawl an existing list of sites. Do you mean that you want to find sites having something in their title tag? Just explain in plain English (you though the W3C may have a list of all existing web sites on the internet ?)

    If yes you can try the "intitle:" preifx. See http://www.googleguide.com/advanced_operators_reference.html#intitle 

    Given your other thread,n having high goals is fine but you likely far underestimate the complexity (and resources if this is not just for few sites but intended for general sites) of creating your own better search engine from zero (wiithout even talking about having others using it ???)

    Edit; Google also have site: to restrict a search to a given site or domain. Or you mean that you want to find all server names in the w3.org domain ??

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 4, 2021 1:24 PM
  • User-458598543 posted

    Hey,

    in first, please excuse me for the wrong orthograph of 3WC in place of W3C ; in second, thanks you for your answers. what I do, it's to have a list of internet (web) sites with their dns address. after that, I realize a function that filters the list of internet (web) sites by one or several key words. the problem is where to obtain this list of sites, should be asked a public server or is there a ".txt" or ".csv" file including the list of internet (web) sites.

    thanks you for your contribution.

    M.A.

    Thursday, March 4, 2021 1:50 PM
  • User753101303 posted

    And you want a lisrt for sites found in the w3.org domain or you want to scan the whole web ?  According to https://siteefy.com/how-many-websites-are-there/ you have at least 1,000,000,000 web sites on intenet ?!!

    A search engine is finding sites by using links found in earlier analyzed sites (Google does have DNS servers, I suspect they offer this service so that they can also "discover" new domains). Also a webmabster can add his own site by hand to make sure it is found. Then all this content is preprocessed, stored etc... so that it can be used to return a search response without scanning sites in "real time". See perhaps https://moz.com/beginners-guide-to-seo/how-search-engines-operate

    If you want to find sites from the general internet to me a realistic solution is to use an existing API such as https://www.microsoft.com/en-us/bing/apis/bing-web-search-api. it seems Google doesn't have an official API at this time ???

    Edit: accordiing to another source it can take between 4 days and 4 weeks to have Google to find a new website. You expect to have how manu sites in the list you are looking for ?

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 4, 2021 4:51 PM
  • User-458598543 posted

    hi,

    for the research engine (motor) project, I would have a (completed) list of internet (web) sites. I will do the scan by a filter after. as I know, there is two abilities as in first, asking a public server with a special request and secondly, having a list of internet (web) sites in a table as a ".txt" or ".csv" file. I will be happy then you pass to me one or the two methods in choice.

    truly yours.

    M.A.

    Thursday, March 4, 2021 6:08 PM
  • User753101303 posted

    Hummm... Try perhaps https://whoisds.com/newly-registered-domains which provide a daily list of new domains for free (around 100 000 per day). Seems you have a full list at https://whoisdatacenter.com/whois-database/ for quite a price but you have also free samples by country at the end of this page.

    It should be more then enough to give a try at whatever you want to do and see if you are still within your time and resource constraints. As pointed earlier a search engine can also discover new sites by extracing links from already processed sites...

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 4, 2021 7:17 PM
  • User-458598543 posted

    thanks you for your contribution.

    M.A.

    Friday, March 5, 2021 2:25 AM