none
How can write Regex to select the domain from string? RRS feed

  • Question

  • I write Regex to validate and select the domain name from the string

    Like this:

    http://domain.tld

    https://domain.tld

    http://www.domain.tld/anything

    domain.website

    domain.technology

    http://www.domain.tld/anything.html

    domain.co.uk or domain.in or anything

    but I have a problem

    If the domain contain “-“ like domain-dash.com or http://www.domain.tld/anything-anything.html

    Both of them have a problem :

    domain-dash.com detected as dash.com

    and http://www.domain.tld/anything-anything.html detected as anything.html as a domain name

    I wrote two Regex:

    (?!w{1,}\.)(\w+\.?){1,}([a-zA-Z]+)(\.\w+)

    And

    \w{3,5}\:\/\/\w+.(\w+\.\w{2,4})|\w{3,5}\:\/\/(\w+\.\w{2,4})

     

    How can I handle that?

    Wednesday, May 1, 2019 8:36 PM

Answers

  • Ok Guys thanks for all of you I used this code for my solution. it have some problem with sub domain but it work fine! 

    public (bool isDomain, string domain, string domainTLD) IsDomain(string url)
            {
                Regex regexTwo =
                    new Regex(
                        @"([a-z0-9A-Z]\.)*[a-z0-9-]+\.([a-z0-9]{2,24})+(\.co\.([a-z0-9]{2,24})|\.([a-z0-9]{2,24}))*",
                        RegexOptions.Compiled);
                bool isValid = regexTwo.IsMatch(url);
                
                string dom = "";
                string DomainTLD = "";
                if (isValid)
                {
                    Match match = regexTwo.Match(url);
                    dom = match.Value.Replace("www.", "");
                    string[] splitedDomain = dom.Trim().Split('.');
                    int splitedDomainLength = splitedDomain.Length;
                    int TLD = splitedDomainLength - 1;
                    DomainTLD = splitedDomain[TLD];
                }
                return (isDomain: regexTwo.IsMatch(url), domain: dom, domainTLD: DomainTLD);
            }

    Thursday, May 2, 2019 8:25 AM

All replies

  • Why aren't you just using Uri? Your regex isn't going to properly handle all the possible URL rules that are defined although it might handle the common ones. Uri just seems so much easier.

    In regards to your regex you need to expand the list of allowed characters to include the dash (and possibly other characters). On this link is an example of what that might look like. Here's another link to a C#-specific version.


    Michael Taylor http://www.michaeltaylorp3.net

    Wednesday, May 1, 2019 8:58 PM
    Moderator
  • I knew that but if insert only domain name like "domain.com" Uri can't validate my customer insert url or domain and I can't used Uri!

    Wednesday, May 1, 2019 9:53 PM
  • I don't understand why you can't use Uri. Can you provide your usage case?

    Michael Taylor http://www.michaeltaylorp3.net

    Wednesday, May 1, 2019 9:56 PM
    Moderator
  • If you used this:

    var url=new Uri("domain.com");

    Uri can't Parse it.

    In my case I want get url or domain name from user and send it to whois server


    Wednesday, May 1, 2019 10:22 PM
  • Play around with this code and see if it gives you what you want.

    static void Main ( string[] args )
    {
        var inputs = new[]
        {
            "domain.com",
            "http://www.google.com",
            "http://www.domain.tld/anything-anything.html",
            "none"
        };
    
        foreach (var input in inputs)
        {
            var builder = new UriBuilder(input);
    
            try
            {
                var uri = builder.Uri;
                Console.WriteLine($"Host for '{input}' is '{uri.Host}'");
            } catch
            {
                Console.WriteLine($"'{input}' is invalid");
            };
        };
    }


    Michael Taylor http://www.michaeltaylorp3.net

    Wednesday, May 1, 2019 10:41 PM
    Moderator
  • Thanks for your answer, UriBuilder is very helpful but when url is like:

    http://ftp.domain.com or http://www.domain.com or http://anything.domain.com

    or http://anything.domain.co.uk

    how can get exact domain name without subdomain? 

    Wednesday, May 1, 2019 11:38 PM
  • Everything after the scheme and before any / is the host. If you're referring to the individual parts of the host (the subdomains) then the Host property gives you everything. If you want to pick apart the host into parts then use Split. Normally we don't need to do this so the Uri class doesn't provide that functionality directly.

    var parts = uri.Host.Split('.');
    
    You can then look at any part of the host that you need.


    Michael Taylor http://www.michaeltaylorp3.net

    Thursday, May 2, 2019 12:25 AM
    Moderator
  • Hi 

    Thank you for posting here.

    For your last reply, you want to get exact domain name without subdomain.

    The following link could be helpful for you.

    https://stackoverflow.com/questions/4643227/top-level-domain-from-url-in-c-sharp​​​​​​​

    How+to+get+the+top+level+domain+in+C+​​​​​​​

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, May 2, 2019 2:24 AM
    Moderator
  • Ok Guys thanks for all of you I used this code for my solution. it have some problem with sub domain but it work fine! 

    public (bool isDomain, string domain, string domainTLD) IsDomain(string url)
            {
                Regex regexTwo =
                    new Regex(
                        @"([a-z0-9A-Z]\.)*[a-z0-9-]+\.([a-z0-9]{2,24})+(\.co\.([a-z0-9]{2,24})|\.([a-z0-9]{2,24}))*",
                        RegexOptions.Compiled);
                bool isValid = regexTwo.IsMatch(url);
                
                string dom = "";
                string DomainTLD = "";
                if (isValid)
                {
                    Match match = regexTwo.Match(url);
                    dom = match.Value.Replace("www.", "");
                    string[] splitedDomain = dom.Trim().Split('.');
                    int splitedDomainLength = splitedDomain.Length;
                    int TLD = splitedDomainLength - 1;
                    DomainTLD = splitedDomain[TLD];
                }
                return (isDomain: regexTwo.IsMatch(url), domain: dom, domainTLD: DomainTLD);
            }

    Thursday, May 2, 2019 8:25 AM