locked
MailAddress seems to have a couple of bugs RRS feed

  • Question

  • BUG1: I want to use MailAddress to validate email addresses. It handles most illegal ascii characters correctly, but not the unquoted space character.

    Using MailAddress(string address), it takes everything before the last space and treats it as the DisplayName.

    So I use MailAddress(string address, string displayName) and pass in null for displayName. The ctor still treats everything before the last space as the DisplayName. If I explicitly pass in null for displayName, I would expect the code to treat the address parameter as purely the email address and nothing else.

    BUG2: MailAddress incorrectly parses foo."bar"@xyz.com. It treats "foo." as the DisplayName

    BUG3: MailAddress incorrectly parses "foo".bar@xyz.com. It throws a FormatException

    BUG4: MailAddress does not use IdnMapping to validate the domain portion of an email address. E.g., this throws an exception:

    var foo = new IdnMapping().GetAscii("bctâ\u0080\u008b.aphp.fr");

    which means that it's an incorrect domain name for email addresses, however, MailAddress seems happy with that domain name:

    var foobar = new MailAddress("foo.bar@bctâ\u0080\u008b.aphp.fr", string.Empty);


    • Edited by triple_vee Thursday, December 6, 2012 12:01 AM
    • Moved by Mike Feng Thursday, December 6, 2012 11:35 AM (From:.NET Base Class Library)
    Thursday, December 6, 2012 12:01 AM

All replies

  • You are using characters that are no ASCII like '\u' so that is why you are getting an exception.  the other problem you are having is with special character inside a c# string.  Either you need to use a backslash before the special character or use an @ sign at the beginning of the string like below.

    string mystring = @"bctâ\u0080\u008b.aphp.fr";

    string mystring1 = "bctâ\\u0080\\u008b.aphp.fr";


    jdweng

    Thursday, December 6, 2012 6:54 AM
  • I assure you that

    string mystring = "bctâ\u0080\u008b.aphp.fr";

    compiles just fine. Cut/paste it in a C# program and see for yourself. The reason why I use the \u syntax is that these are unprintable Unicode characters.

    Non ascii characters are legal in hostname portions of emails. See

    http://en.wikipedia.org/wiki/Internationalized_domain_name

    However, the unprintable ones are not, so I'm saying that IdnMapping() is doing the right thing by throwing an exception, but MailAddress should as well if the hostname is not legal. In other words, it should delegate validating the hostname to IdnMapping. Just as an fyi, the ToAscii() and ToUnicode() methods on IdnMapping are inspired by the Punycode encoding referenced in the above wikipedia article. 

    Thursday, December 6, 2012 7:28 AM
  • I know unicode character a legal in email address.  Email addresses like chinese names are used all the time.  I agree with most things you said.  The GetAscii() method should fail when a unicode character is in the string.  The MailAdddress() method does not have a TryParse() method, it just excepts a string and doesn't verify the contents of the string.  It just treats the string like any other string.  Not doing the parsing I don't think is necessarily a bug.  It is a limitation of the method.  The method should accept a new parameter type (instead of string) for a valid email address.

    jdweng

    Thursday, December 6, 2012 7:48 AM
  • Joel,

    First of all, the GetAsci() method should only fail with very specific Unicode characters. Read the Wikipedia article or look at the RFC. For legal Unicode characters, it performs an Encoding called Punycode encoding. So for a domain like this:

    汉字/漢字.com

    IdnMapping.GetAscii() returns

    xn--/-471bb364ohid.com

    Try it for yourself in C#. In the example in my original post, I gave a specific example of a "bad" hostname that causes GetAscii() to throw a legitimate exception, but MailAddress does not.

    Second of all, MailAddress certainly does do parsing. Use ILSpy to see what it does. It parses character by character. It will throw an exception if it finds illegal ASCII characters in the local part of the email address such as ' ', '"', '(', ')', etc. unless the entire local part is quoted per the RFC. In fact, the code tries to parse out the display name, the local part, and the hostname and does a reasonable (if imperfect) job of enforcing the RFC. Try to create an email address without an '@' in the string if you don't believe me.

    I'm not sure why you're trying to refute the legitimate bugs I documented when you don't even attempt to write the simplest code to verify. It's quite annoying.

    Thursday, December 6, 2012 8:42 AM
  • String consists of a mixture one and two byte characters.  If you have a Chineese name with four chinesse letters (each letter a unicode character) followed by a RETURN the string will be 9 bytes (2*4 +1).  the RETURN is only one byte.  There is no checking in a string class for valid characters because the string class doesn't parse the data.  Methods that use string class can check for valid charcters.  Since the MailAddress can use unicode characters it doesn't do the checking.

    I agree that the MAILAdddress class should be checking for valid email addresses according to the RFC specification, but it isn't.  The RFC specifications are often vague in the requirements because one software vendor may allow an option and another software company may not.  The RFC specification if you read it carefully probably says what is allowable.  It doesn't say what is NOT allowable.  So UNIX operating Systems may block certain email addresses and Apple may not block certain email addresses.  So under these cases should the Microsoft NET Library stop people from sending messages to Apple email addresses?


    jdweng

    Thursday, December 6, 2012 10:53 AM
  • Hi Triple_vee,

    Welcome to the MSDN Forum.

    I have moved this thread to System.Net forum for better support.

    Thank you for your understanding and support.

    Best regards,


    Mike Feng
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Thursday, December 6, 2012 11:33 AM
  • I'm asking you nicely to please stop replying to this thread. You continue to pile on incorrect information. I have notified the moderators.

    You have a fundamental misunderstanding of how strings are represented in .NET.

    A string in .NET always consists of a sequence of 2 byte chars regardless of whether a char is representing an ascii character or not. And GetAscii() returns a string consisting of 2 byte chars. Each of which contains an ascii code point. It's true that Ascii characters can be represented by a single byte, but .NET chooses to represent an Ascii character with a char.  Each ch

    And I assure you that MailAddress is checking for illegal characters, just not all of them. Try creating a MailAddress object with this string "()@xyz.com". And the RFC spec certainly does say what is not allowable. Why don't you read the RFC. Oh, heck, you won't read it...you'll just reply with some incorrect nonsense.

    And what's this nonsense about Apple? I assure you that every email address ever created by Apple customers through their hosted email services (e.g., me.com, icloud.com) can be sent to using the .NET libraries.


    • Edited by triple_vee Friday, December 7, 2012 7:25 AM
    Thursday, December 6, 2012 4:18 PM
  • I have been working on Military contracts and have read many specification in my life time including many of the IEEE and RFC specfications. I'm a very realistic engineer and understand the legal as well as the economical decisions that are made as part of engineering.  The RFC spoecifications quit often are written after to companies have already written software and neither wants to change the software.  So the IEEE makes compromises to allow both companies software to work.  My example about Apple was just a hypothetical example.

    You at least agree with me that a string class does have a mixture of one and two byte characters.  The string class without knowing the encoding that is occuring can't really check to determine is a character is valid or not valid.  You are assuming that the string class can magically determine the type of encoding.  it can't.  A check would need to be performed as part of an interface with another class.  There is a very large number of unicode characters that are acceptable for email addresses and not performing the check I don't think is necessarily a bug, but rather an engineering decision.


    jdweng

    Thursday, December 6, 2012 4:40 PM
  • What are you talking about?! I'm quoting you directly...

    "If you have a Chineese name with four chinesse letters (each letter a unicode character) followed by a RETURN the string will be 9 bytes (2*4 +1).  the RETURN is only one byte. "

    You made the assertion that the string is 9 bytes. A .NET string can never have 9 bytes. It will always have an even number of bytes because a char is 2 bytes. Furthermore, a RETURN control character is two bytes. 

    Thursday, December 6, 2012 6:01 PM
  • I have been working on Military contracts and have read many specification in my life time including many of the IEEE and RFC specfications. I'm a very realistic engineer and understand the legal as well as the economical decisions that are made as part of engineering.  The RFC spoecifications quit often are written after to companies have already written software and neither wants to change the software.  So the IEEE makes compromises to allow both companies software to work.  My example about Apple was just a hypothetical example.

    You at least agree with me that a string class does have a mixture of one and two byte characters.  The string class without knowing the encoding that is occuring can't really check to determine is a character is valid or not valid.  You are assuming that the string class can magically determine the type of encoding.  it can't.  A check would need to be performed as part of an interface with another class.  There is a very large number of unicode characters that are acceptable for email addresses and not performing the check I don't think is necessarily a bug, but rather an engineering decision.


    jdweng

    odd number
    the above is merely the character count and the count in bytes is even

    but for a detailed knowledge of strings and representation in byte with lengths one may refer to

    http://msdn.microsoft.com/en-us/library/ds4kkd55.aspx

     public static void PrintCountsAndBytes( String s, Encoding enc )  {
          // Display the name of the encoding used.
          Console.Write( "{0,-30} :", enc.ToString() );
          // Display the exact byte count. 
          int iBC  = enc.GetByteCount( s );
          Console.Write( " {0,-3}", iBC );
          // Display the maximum byte count. 
          int iMBC = enc.GetMaxByteCount( s.Length );
          Console.Write( " {0,-3} :", iMBC );
          // Encode the entire string. 
          byte[] bytes = enc.GetBytes( s );
          // Display all the encoded bytes.
          PrintHexBytes( bytes );
       }


    Please remember to mark the replies as answers if they help and unmark them if they provide no help , or you may vote-up a helpful post



    Monday, December 17, 2012 8:42 AM