none
Subject encoding on SmtpClient/MailMessage

    Question

  • Hi,

    I am trying to send emails that contain non-ASCII characters using the SmtpClient and MailMessage classes.

    I am using an external mailing service (MailChimp) and some of my emails have been rejected by their SMTP server. I have contacted them and this is what they replied:

    It appears the subject line is being Base64 encoded and then Quoted-Printable encoded, which generally should be fine, but one of the characters is being broken across two lines. So when your subject lines are a bit longer, in order to be processed correctly, it's broken on to two lines. When using UTF-8 quoted printable in a subject line, character strings aren't supposed to be broken between lines. Instead a line should be shorted so that the full character string remains together. In this case, that's not happening, so the string of characters that represents a single character is being broken across multiple lines, and therefore isn't validly UTF-8 quoted-printable encoded.

    The problematic subject is the following:

    Subject: XXXXXXX - 5 personnes vous ont nommé guide

    Which is, in UTF-8/Base64:

    Subject: WFhYWFhYWCAtIDUgcGVyc29ubmVzIHZvdXMgb250IG5vbW3DqSBndWlkZQ==

    Because that header would exceed a certain maximum length (I am unsure whether it is the Quoted-Printable encoding and its limit of 76 characters per line, or the SMTP header limit), after encoding and split, the header will become:

    Subject: =?utf-8?B?WFhYWFhYWCAtIDUgcGVyc29ubmVzIHZvdXMgb250IG5vbW3D?=
     =?utf-8?B?qSBndWlkZQ==?=

    Apparently this causes an issue when decoding (because the first line cannot be decoded to a valid string). I am not sure I fully understand the problem, and I have the following questions:

    • Why is the ?utf-8?B? part repeated? Shouldn't the QP encoding happen before splitting the line and thus its header shouldn't be repeated?
    • After QP-decoding, shouldn't we end up with a valid 1-line Base64 string?
    • There is a space at the start of the second line which is outside of the QP encoding, could this be the problem?
    • Is the encoder broken, or it is the decoder?

    Also note that some other SMTP servers will accept this message, though that does not mean it is valid.

    As a workaround, I have tried disabling the Base64 encoding, which apparently is unnecessary, however the MailMessage class has a BodyTransferEncoding property that controls this encoding, but only for the body part of the message. No property seems to control the "transfer" encoding of the subject.

    Any help appreciated.

    Thanks,
    Xavier

    [Cross-posted from StackOverflow]


    Wednesday, April 17, 2013 1:20 AM

Answers

All replies

  • Hi Xavier,

    I am moving your question to appropriate forum for better responses. 


    Bob Shen
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Thursday, April 18, 2013 9:57 AM
  • Hi,

    Thanks for your post.

    I am trying to involve someone familiar with this topic to further look at this issue. There might be some time delay. Appreciate your patience.

    Best Regards.


    Haixia
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Friday, April 19, 2013 1:37 AM
  • Hi Haixia and thanks for your reply.

    I am looking forward to their response.

    Wednesday, April 24, 2013 10:07 PM
  • F.Y.I. according to RFC 2822 Section 2.1.1, the "78 character per line" limit obviously refers to "display character count" (they did spell out the reason behind this decision) and the receipent email client should be able to handle is 998 character per line. 

    Therefore IMO it's encoder's problem.

    Btw, if the your email client correctly handles break characters, try to insert soft-breaks between display charactors and see if that'll workaround your problem.

    Or better, do the line breaking and Base64 encoding yourself and set them to the subject field inside your application.

    EDIT: I think you can set SubjectEncoding Property since .NET v2.0.

    • Edited by cheong00 Thursday, April 25, 2013 4:08 AM
    Thursday, April 25, 2013 3:59 AM
  • Thanks for your reply, cheong.

    Correct me if I'm wrong but I think that 3 encodings are applied to the subject, in order:

    • UTF-8
    • Base64
    • Quoted Printable

    As far as I can tell the SubjectEncoding property only controls the first encoding (UTF8).

    The Quoted Printable encoding also has a limit set to 76 characters per line, so I guess it is possible that the line break comes from the QP encoding. If that was the case then each line would then be under the SMTP header limit which is 78, thus no further line-breaking would occur.

    Friday, April 26, 2013 4:02 AM
  • Base64 and Quoted Printable is irrelevent here because there're just "way" to ensure the subject line does not have "bit" discarded during transfer. (Traditional transferring assume 7-bit characters are used so using any "high-ASCII range" characters here is considered risky. (Base64 encoding is specified in the "B" after "?utf-8?", if it's Quotad printable that would have been "Q". See Section 4 of RFC 2047 below)

    Actually if you see the encoding rules of section 5 in RFC 2047, you'll see mixing encoded characters and non-encoded characters in subject line is legal.

    Therefore the only relevent information is the expected encoding when the subject line is decoded, and that's what SubjectEncoding property do

    Friday, April 26, 2013 4:29 AM
  • Ok, so after digging into the code of the SmtpClient classes, this is what actually happens:

    • When the subject encoding is set to either UTF-8 (default), UTF-32, Unicode or BigEndianUnicode AND the subject contains non-ASCII characters, the subject will be encoded in Base64.
    • Otherwise the subject is encoded in Quoted Printable

    The issue I am experiencing comes from the Base64 encoded string being broken in the middle of a non-ASCII character such as the resulting base64 lines aren't valid UTF-8 strings by themselves.
    I have changed the subject encoding to ISO-8859-1 and the subject is now encoded in QP.

    The encoded subject is:

    Subject: =?iso-8859-1?Q?XXXXXXX_=2D_5_personnes_vous_ont_nomm=E9_guide?=
     =?iso-8859-1?Q?_=28Test=29?=

    You can see here the special character é has been encoded as =E9. Now the question is, will the same problem occur if I change the string so that the é appears right at the line break.

    Subject: =?iso-8859-1?Q?XXXXXXX_=2D_5_personnes_vous_ooooooooont_nomm=E9?=
     =?iso-8859-1?Q?_guide_=28Test=29?=

    Still good. Let's add one more character:

    Subject: =?iso-8859-1?Q?XXXXXXX_=2D_5_personnes_vous_oooooooooont_nomm?=
     =?iso-8859-1?Q?=E9_guide_=28Test=29?=

    You can see that the =E9 is now fully on the next line, which is the correct behavior: the line break should not be placed in the middle of an encoded character.

    So switching to ISO-8859-1 solves the problem for me but the issue still stands with the UTF-8 and possibly other encoding.

    Friday, April 26, 2013 5:05 AM
  • Did you tried letting the .NET runtime do the Base64 encoding line breaking for you?

    In Section 5.3 of RFC 2047, it explicitly demand "The 'encoded-text' in each 'encoded-word' must be well-formed according to the encoding specified."

    AFAIK .NET runtime has been following this RFC well so I believe you should not experience the problem you originally have. (And since .NET store string in UTF-16 internally, it has no reason to have unintentionally broken character between lines)

    Friday, April 26, 2013 5:27 AM
  • I have only used all the default parameters and encodings, not encoding anything by myself.

    Here are the complete steps to a repro:

    Use tcptrace to act as a proxy to your SMTP server, so we can inspect packets.

    Send an email using the following code:

    var client = new SmtpClient("localhost", 587);
    
    client.Credentials = new NetworkCredential("...", "...");
    client.Send("test2@test.com", "test2@test.com",
        "XXXXXXX - 5 personnes vous ont nommé guide", "Test");

    Captured packet:

    EHLO ...
    AUTH login ...
    MAIL FROM:<test2@test.com>
    RCPT TO:<test2@test.com>
    DATA
    MIME-Version: 1.0
    From: test2@test.com
    To: test2@test.com
    Date: 26 Apr 2013 15:53:56 +1000
    Subject: =?utf-8?B?WFhYWFhYWCAtIDUgcGVyc29ubmVzIHZvdXMgb250IG5vbW3D?=
     =?utf-8?B?qSBndWlkZQ==?=
    Content-Type: text/plain; charset=us-ascii
    Content-Transfer-Encoding: quoted-printable
    
    Test
    
    .

    Note how decoding WFhYWFhYWCAtIDUgcGVyc29ubmVzIHZvdXMgb250IG5vbW3D results in a partial UTF-8 character, but decoding the full string without line-break, in the same decoder, works perferctly:

    WFhYWFhYWCAtIDUgcGVyc29ubmVzIHZvdXMgb250IG5vbW3DqSBndWlkZQ==

    Friday, April 26, 2013 6:00 AM
  • That is legal. When it break lines it can break lines as long as they are enclosed with complete "encoding word".

    See Section 8 of RFC 2047 for example. The email client have to concate it back to whole line before decoding it in Base64.

    Friday, April 26, 2013 6:09 AM
  • I looked at Section 8 of RFC 2047, and I do not agree that this is the same. It is legal to break the line, but it should not be broken in the middle of a character. If you are referring to this example:

    Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
        =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=

    Then it different because both

    SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=

    and

    dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==

    can be decoded by themselves.


    Friday, April 26, 2013 6:31 AM
  • Agreed. The should have made it

    =?utf-8?B?WFhYWFhYWCAtIDUgcGVyc29ubmVzIHZvdXMgb250IG5v?=
    =?utf-8?B?bW3DqSBndWlkZQ==?=

    instead.

    Consider file a bug here if you haven't already done so.



    • Edited by cheong00 Friday, April 26, 2013 7:14 AM
    Friday, April 26, 2013 7:12 AM
    • Marked as answer by Xavier Poinas Saturday, April 27, 2013 10:58 AM
    Saturday, April 27, 2013 12:11 AM
  • I had a similar issue attempting to render &#233; in the subline line.  Within the body things worked fine, but the subject line wouldn't render correctly.

    This seemed to work for me:

    var raw = HttpUtility.HtmlDecode("...html encoded junk...");

    var msg = new MailMessage();

    msg.SubjectEncoding = new System.Text.UnicodeEncoding();

    msg.Subject = raw;

    Wednesday, June 05, 2013 2:35 AM
  • Xavier have issue where the "split line" of subject occurs on the middle of bytes of a single Unicode character, so that's a seperate matter.
    Wednesday, June 05, 2013 4:10 AM
  • Hi,

    I have checked the response from the product group related to the issue you have reported via

    https://connect.microsoft.com/VisualStudio/feedback/details/785710/mailmessage-subject-incorrectly-encoded-in-utf-8-base64 . Unfortunately, the product group is not able to take this bug at this time as per the response for it.

    If you still observe the issue and would like us to investigate it further and take it up with product group again? Then I would recommend you to open a new support case as it will enable us to provide a more in-depth level of support. 

    Please visit the link below to see the various paid support options that are available to better meet your needs.
    http://support.microsoft.com/default.aspx?id=fh;en-us;offerprophone

    Thanks for your patience and cooperation.

    Regards,

    Brij Raj Singh  |  Microsoft Developer Support - Messaging & Collaboration
    Brijs Blogging... Looking Beyond the Obvious

    Monday, June 17, 2013 4:51 PM
  • Hi

    I am not sure I understand why I need to reopen an issue. The last response from Microsoft on the already opened issue was:

    Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly.

    Currently I have no need for the issue to be resolved as I am using the work-around I have posted along with the issue on Connect.

    Still, the issue exists and the work-around may not always be applicable (for example, when the UTF-8 encoding is required).

    Feel free to either correct it or leave it as-is.

    Cheers

    Xavier

    Wednesday, June 19, 2013 1:28 AM