none
Random mail lost with Database mail and non-Microsoft SMTP servers RRS feed

  • Question

  • The following facts was tested and reproduced on 3 completely different environments and with SQL 2005, 2008, 2008R2, 2012 and 2016.


    • When DB mail send email to Postfix(2.10.1 and 3.1.2) SMTP servers report several  errors  “lost connection after CONNECT” and a percentage of them will result in email lost.
    • When DB mail send email to a non-Microsoft email relay software(http://emailrelay.sourceforge.net/) installed on a Windows server will report events ID 1002 “winsock select error: 10053“.
    • A packet capture shows a specific behavior from DBmail that didn’t exist with other systems . DBmail initiate one or multiple  TCP connections depending of the number of mail to send then in a matter of micro or millisecond for some of those TCP connections, it send a FIN/ACK  then a RST/ACK, which is interpreted as a lost connection from Postfix and the majority of SMTP daemons.  Those TCP connections  that didn’t have a FIN/ACK complete normally. 
    • When DBMail is talking to a Microsoft SMTP, it begins the same way including the FIN/ACK for some of the TCP connections but the SMTP conversation will continue under a different TCP source port and are followed by the sequence number. So Microsoft SMTP do not drop the whole SMTP connection when the initial TCP connection is dropped.
    • So for some SMTP conversation, the source port changes during the conversation usually just after the 220 Welcome message which cause a lost connection for the majority of the SMTP services.
    • SQL Database mail didn’t see any errors even when mails are lost.

    I was always on the assumption that the SMTP protocol is a single TCP connection by design as described in RFC 821.

    Right now the only work around we found was to configured DBmail to send emails to an IIS SMTP service which relay to our main Postfix server. We would like to get rid of the middle man if possible.

    Can the EXE behind DBmail (Databasemail.exe) be replaced by something else more standard in term of SMTP conversation ?


    Any help or suggestion are welcome?

    Monday, October 17, 2016 3:26 PM

All replies

  • No, it is configured to use the databasemail.exe application.

    BUT what you need to do is to disable the accounts that dbmail uses. Then you can write a custom application to read the mail directly from the sysmail_mailitems table in the msdb database and then send it to the postfix server.

    The messages you are getting do look like connection issues. How stable is the link between your SQL Server and postfix?

    Monday, October 17, 2016 3:39 PM
  • The link is very stable and i reproduce the issue in 3 differents location with a completely different networking.

    Our Postfix server is receiving mails from multiple  hundred systems and send over half a millions mails per month and the only systems with which i had the lost connection issues are with SQL that use DBmail.

    The problem is really easy to reproduce anywhere.

    Monday, October 17, 2016 4:27 PM
  • Well, you can rule out the network then. You should be able to look in the failed email view and see if there is any characteristics of the outgoing mail which makes it rejected.  IE, over a certain size.

    select len(body) , file_attachments from sysmail_faileditems

    Monday, October 17, 2016 4:31 PM
  • 10053 means:

    The error code 10053 indicates that an established connection was aborted by the software in your host machine.

    Can you change your datababsemail server to do retries and enable verbose logging? Right Click on Database Mail in Management Studio and select configure database mail and select View or Change system parameters.

    Monday, October 17, 2016 4:36 PM
  • No errors are seen from the point of view of SQL no matter the verbose mode.

    The error 10053 is on a Windows smtp host and it's the equivalent as the lost connection seen on postfix and both are cause as you said by the software but in this case the software that is aborting the connection is Database mail. 

    We can clearly see it in a network capture which shown DBmail making the TCP connection and then ask to close it  with the FIN/ACK and RST packet.

    As for any SMTP other than Microsoft, closing the TCP connection also means closing the SMTP session, the SMTP servers are just following the instructions that DBMail gives them. But as i described earlier,  when the smtp server is Microsoft, DBMail may still close the initial TCP connection, but it will continue to send the mail data using another TCP connection with a different source port but in the same sequencing numbers than the original TCP connection so the Microsoft SMTP do not close the entire SMTP session when the initial TCP connection is closed.

    I think this problem is something more deeper as the problem occur at the TCP/IP stack level that Database Mail used. 

    Monday, October 17, 2016 5:05 PM
  • You might want to consider opening up a support incident with CSS. I am not sure what part of the world you are in, but in the US we go here.

    https://msdn.microsoft.com/en-us/library/bb266240.aspx?f=255&MSPPError=-2147217396

    Monday, October 17, 2016 5:10 PM
  • DBMail is a very basic implementation of a mail queue to send SMTP mail.  It has no real configurations or options to change its operation.

    I suggest you log a bug on https://connect.microsoft.com/SQLServer/.

    Monday, October 17, 2016 5:19 PM
  • Yes, this would be a good idea to open a ticket with Microsoft but as the problem is easy to reproduce with all version of SQL, i was expecting that someone else also experienced the issue.


    Tuesday, October 18, 2016 12:32 PM
  • I have used DBMail with several SMTP servers and have not experienced the issues you describe. 

    The most common reason for SMTP disconnects is the quota or SPAM settings on the SMTP server itself.  Most SMTP servers have a setting on how many connections/emails can be sent per hour before "blacklisting" the server.  Are you sure you are not simply running into a problem with those settings?

    Tuesday, October 18, 2016 4:28 PM
  • Hi Ali,

    <gs class="GINGER_SOFTWARE_mark" ginger_software_uiphraseguid="340b4ddb-6d79-48ca-959d-2767527a1940" id="92553547-5a9c-4f0e-8ae8-b2070ecd1b02">winsock</gs> select error: 10053“.-->

    https://www.pingman.com/kb/pdf-39.html


    Please click Mark As Answer if my post helped.

    Tuesday, October 18, 2016 4:40 PM
  • We've been running our setup since 2 years now before someone notify us that some emails did not reach their recipients. So the percentage lost is small. It's after this investigation that we found the "Lost connection" message which did not always result in an email lost and by itself this message just means that the client at the other end is gone or ask to close the connection, so the message itself is not an issue normally. This is why we didn't bother with that message in the past.

    But when we analyzed all those messages of Lost, among the hundreds of systems that use our Postfix servers, only the SQL with DBmail cause those messages. This is where i think i found a link between those messages and the fact that we lost some mails.

    We've reproduce the problem in 2 differents labs with clean "out of the box" installation of SQL and a postfix server on centos. There was no antivirus, firewall, no quota management and so on. Fresh installation with nothing special.



    As this was a lab, there was no mail volume and only the DBmail was sending basic mail of few bytes, so we can exclude a load, performance or size issue and we had the "Lost connection" message. But as i said, not all lost connection cause a lost of mail. Usually the mail lost occurs more often with bigger mail which are still small with an average of 2MB.



    May be other people have the same problem but they never notice it.



    At this point my personnal conclusion based on the network capture, is that DBmail for some SMTP conversation close the initial TCP connection and try to continue on a new one. Microsoft SMTP does not bother with that but Postfix and others i test did. My understanding of the RFC 821 is that an SMTP conversation is a single TCP connection and all my network capture tend to confirm that. If i'm right this make DBmail not 100% compliant with SMTP standard.


    Tuesday, October 18, 2016 5:20 PM
  • If you are expecting someone are in the same situation, you should be sure it's a bug. Why not try to report one here https://connect.microsoft.com/SQLServer/ as Tom suggested? Maybe Microsoft will give some better ideas.
    Wednesday, October 19, 2016 9:57 AM
  • Yes this is probably what i will do. Thanks all. If Microsoft found something i will post it here.
    Wednesday, October 19, 2016 12:40 PM
  • You said there were no errors.  But I am curious does sysmail_faileditems  show errors or retry?

    Wednesday, October 19, 2016 12:52 PM
  • No, that table is empty.
    Wednesday, October 19, 2016 3:44 PM
  • Thanks Ali.
    Thursday, October 20, 2016 2:23 AM
  • I came across this post while troubleshooting SQL database mail connectivity issues with a custom SMTP Server. Our custom SMTP server works with every mail client we've ever tested (until now I suppose); I could not figure out why on earth database mail was creating multiple TCP sessions and then sending RSTs. There's not much out there about this issue, thank you for your post.
    Tuesday, October 30, 2018 6:08 PM