none
Showstopper: BIG problems with Outlook MAPI performance after Windows 10 Creator Update (1703) RRS feed

  • Question

  • Hi guys,

    We have Outlook Add-ins which worked perfectly and VERY performant under Windows 10.
    After Windows 10 Creator Update ALL users complain that performance decreased dramatically.

    I started researching with debuggers / profilers and and at the last I came to several 
    MAPI methods which takes the most time.
    It's MAPIAllocateMore / MAPIAllocateBuffer / MAPIFreeBuffer.
    It looks like these methods are used actively in IMessage::CopyTo functionality when 
    MSG-file is read.

    We used MFCMapi sources (great work) as example and we read MSG-files with these calls - it short extract

                    IMessage* pMessage;
                    HRESULT hr = piFolder->CreateMessage(NULL, 0L, &pMessage);

                    hr = ::StgOpenStorage(
                                   msgFilePath,
                                   NULL,
                                   STGM_TRANSACTED | STGM_READWRITE,
                                   NULL,
                                   0,
                                   &pStorage);

                    ULONG ulFlags = 0;
                    hr = ::OpenIMsgOnIStg(NULL,
                                   MAPIAllocateBuffer,
                                   MAPIAllocateMore,
                                   MAPIFreeBuffer,
                                   pMalloc,
                                   NULL,
                                   pStorage,
                                   NULL, 0, ulFlags, &pIMsg);

                    SizedSPropTagArray (1, excludeTags);
                    excludeTags.cValues = 1;
                    excludeTags.aulPropTag[0] = PR_ACCESS;

                    hr = pIMsg->CopyTo(0, NULL,
                                   (LPSPropTagArray)&excludeTags,
                                   NULL, NULL,
                                   (LPIID)&IID_IMessage,
                                   pMessage, 0, NULL);

                    pIMsg->Release();
                    pStorage->Release();

                    hr = pMessage->SaveChanges(KEEP_OPEN_READWRITE);

    and profiler shows 99% time is in pIMsg->CopyTo and further profiler shows ~90% 
    in MAPIAllocateMore / MAPIAllocateBuffer / MAPIFreeBuffer.

    So I created simplest test example and run it on 2 systems

    - Win10 64-bit (Release 1511) (Version 10.0.10586) with Outlook 2016 x32
    - Win10 64-bit (Release 1703) (Version 10.0.15063) with Outlook 2016 x64 - it's with Creator Update

    here my test code:

    void TestMAPIAllocateBuffer()
    {
                    MAPIINIT_0 mapiInit = { MAPI_INIT_VERSION, 0 };
                    HRESULT hr = ::MAPIInitialize(&mapiInit);

                    BYTE* lpb = NULL;
                    MY_LOG("START MAPIAllocateBuffer for 10 000 elements");
                    for (int i = 0; i < 10000; i++)
                    {
                                   hr = ::MAPIAllocateBuffer(255, (void**)&lpb);
                                   MY_LOG(" -> i=%d hr=0x%x", i, hr);
                                   if (lpb != NULL)
                                   {
                                                   MAPIFreeBuffer(lpb);
                                                   lpb = NULL;
                                   }
                    }
                    MY_LOG("END MAPIAllocateBuffer for 10 000 elements");
    }

    on Win10CU with Outlook 2016 x64 - it takes 21 sec
    on Win10 with Outlook 2016 x32 - it takes < 1 sec

    Everyone can try this simple code.
    Can someone help / explain what happenned with MAPI performance after Win10 Creator Update?
    And what to do now - how to fix / workaround problem?
    It's very BIG problem because ALL users with Windows10 Creator Update clashed with it.
    Please, help.

    Just in case - in profiler stack on Win10CU I see that MAPIAllocateBuffer uses children calls from these DLLs
    DLLs on Win10CU
                    MAPI32.dll
                                  File Version    - 1.0.2536.0
                                   Product Version - 10.0.15063.0
                    RPCRT4.dll
                                   File Version    - 10.0.15063.0
                                   Product Version - 10.0.15063.0
                    COMBASE.dll
                                   File Version    - 10.0.15063.296
                                   Product Version - 10.0.15063.296
                    USER32.dll
                                   File Version    - 10.0.15063.0
                                   Product Version - 10.0.15063.0

    Also I see incredible long call stack on Win10CU for MAPIAllocateBuffer (I'm not sure if everything correct here - I use free profiler VerySleepy)
    But possible it helps - here in % we see time which takes every child call

    MAPIAllocateBuffer - 32774 calls
    |- FGetComponentPath 99.66% MAPI32
       |-_RoGetActivationFactory 23.91% combase
         |- WinRTGetActivationFactoryOfOutofprocClass 96.42% combase
            |- CClientContextActivator::GetClassObject 97.42% combase
               |- CRpcResolver::GetClassObject 99.91% combase
                  |- ObjectStubless 97.91% combase
                     |- ObjectStublessClient 100.00% combase
                        |- NdrpClientCall3 99.86% RPCRT4
                           |- NdrExtpProxySendReceive 92.93% combase
                              |- CSyncClientCall::SendReceive 99.98% combase
                                 |- ClassicSTAThreadSendReceive 100.00% combase
                                    |- CSyncClientCall::SendReceive2 99.91% combase
                                       |- ThreadSendReceive 99.52% combase
                                          |- ModalLoop 56.33% combase
                                             |- CCliModalLoop::BlockFn 98.27% combase
                                                |- RealMsgWaitForMultipleObjectsEx 97.78% USER32 - !!! common Exclusive time 44.76%
                                          |- CAsyncCall::CallI_RpcSend 41.94% combase
                                             |- I_RpcSend 99.97% RPCRT4 - !!! common Exclusive time 34.75%
       |- ObjectStubless 19.76% combase

    Friday, September 8, 2017 11:56 AM

All replies

  • In your TestMAPIAllocateBuffer, you call MAPIInitialize but don't call MAPIUninitialize. That's a leak and in MAPI, leaks manifest as hangs/delays on shutdown. What happens if you add MAPIUninitialize to the function to properly free up MAPI? 

    Friday, September 8, 2017 3:19 PM
  • (Properly signing in this time...)

    Also, since you don't have symbols, we can't trust any function names you pulled for MAPI32. MAPIAllocateBuffer and FGetComponent path are both public exports, so both will show up randomly on unresolved stacks. Right now, I'm not convinced there's any evidence of a problem with MAPIAllocateBuffer.

    What happens when you test CopyTo under the same conditions that you tested MAPIAllocateBuffer? What does the timing difference look like for CopyTo?


    Friday, September 8, 2017 3:56 PM
    Answerer
  • @Valery68 - can you pause the execution when running under the debugger and look at the call stack? The offsets relative to the function names should at least give an idea whether the function names are correct.

    Dmitry Streblechenko (MVP)
    http://www.dimastr.com/redemption
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Friday, September 8, 2017 4:22 PM
  • As a side note, IMessage::CopyTo for the standalone MSG files returned by OpenIMsgOnIStg is broken (at least in the older versions of Outlook - I have not tried the latest): it does not remap the named properties correctly. I had to create my own function that works around that by explicitly copying all the properties, recipients, and attachments.

    Dmitry Streblechenko (MVP)
    http://www.dimastr.com/redemption
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Friday, September 8, 2017 4:25 PM
  • Stephen, thank you for your attention to this problem.
    I thought you are responsible only for Exchange MAPI but not Outlook MAPI so I did not post this question to your blog.
    MAPIUninitialize does not matter in this example because it just test code which, I hope, will be run by someone
    from Microsoft on both systems to see that problem really exists. Time that I wrote in my question - it's between

    MY_LOG("START MAPIAllocateBuffer for 10 000 elements");
    and
    MY_LOG("END MAPIAllocateBuffer for 10 000 elements");
    It's just entries in my log-file which I see on Win10CU like

    08.09.2017 11:45:20 :Info: START MAPIAllocateBuffer for 10 000 elements
    08.09.2017 11:45:41 :Info: END MAPIAllocateBuffer for 10 000 elements

    and on Win10 without CU

    08.09.2017 11:47:15 :Info: START MAPIAllocateBuffer for 10 000 elements
    08.09.2017 11:47:15 :Info: END MAPIAllocateBuffer for 10 000 elements

    As for CopyTo testing - it was first where I found problem.
    Our Add-in read / write many MSG-files - it can be several thousands file to read on Addin start (Above you can see short extract code which we use to load MSG-files to some Outlook folder).
    So when customers started complains on performance after Creator Update I researched this part of code.
    I also implemented test program where I read the same MSG-file (complex enough with embedded attachs) in cycle 100 times and insert items in Outlook folder
    I described all my researches in this thread (my 2 last posts are important)
    https://social.technet.microsoft.com/Forums/windowsserver/en-US/1a5d1757-d445-42b5-8ea4-2ce790e7bd1d/mapi-calls-in-outlook-addin-became-much-slower-after-win10-creator-update-release-1703?forum=outlook
    So time difference on the same systems is

    - Win10 64-bit (Release 1511) (Version 10.0.10586) with Outlook 2016 x32 - ~5 sec
    - Win10 64-bit (Release 1703) (Version 10.0.15063) with Outlook 2016 x64 - ~120 sec

    After this I started profiling my test app and AQtime and VerySleep profilers showed that main time is taken these 3 methods - MAPIAllocateMore / MAPIAllocateBuffer / MAPIFreeBuffer.
    So I wrote this simplest app which calls only MAPIAllocateBuffer / MAPIFreeBuffer 10 000 times and I see this time difference that I described in my question above.
    Friday, September 8, 2017 8:05 PM
  • Also, what does  MY_LOG() do? What happen if you comment out that call in the middle of the loop?

    Dmitry Streblechenko (MVP)
    http://www.dimastr.com/redemption
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Friday, September 8, 2017 8:23 PM
  • Hi Dmitry,

    MY_LOG() method just write log-entry in my file - it looks like

    08.09.2017 11:45:20: Info: START MAPIAllocateBuffer for 10 000 elements

    I do it just to see the time which this cycle takes and if MAPIAllocateBuffer return success.
    If I remove MY_LOG() from cycle - nothing changes in principal.
    Important is that I run the SAME code on 2 systems - Win10 and Win10CU - and I see so big difference in time.
    And more important that this difference see our customers.

    As for to look at the call stack - I run this test app in WinDbg and go through the calls via F11.
    But it's unreal to show here all stack - there are very much child calls.
    I can show small extract

    OLXTestTool1!TestMAPIAllocateBuffer+0x79 - here I call MAPIAllocateBuffer
     MAPI32!MAPIAllocateBuffer+0x38
      MAPI32!FGetComponentPath+0x15f8
       MAPI32!FGetComponentPath+0xf6b - here I see many other child calls like

        combase!CoIncrementMTAUsage+0x25
    and
        combase!RoGetActivationFactory - also big method which many other child calls
    and
        PCRT4!IUnknown_Release_Proxy
         combase!ObjectStublessClient32+0x10a30

    This small stack extract corresponds with profiler stack in my question.
    But I think it's simpler to run this small piece of code on your side to see that problem exists.
    Important is that system must be Win10 with Creator Update.
    I suspect that problem appears not obligatory in MAPI32.dll but possible in some other module after Creator Update. It can be COMBASE.dll or RPCRT4.dll because I see that main time is spent in these modules.

    Saturday, September 9, 2017 9:21 PM
  • Offset 0x15f8 (5624) is probably too high to really be FGetComponentPath. You might want to create a crash dump (you can do that from the Task Manager) and open a support case so that somebody from Microsoft who has access to the Office pdb files could take a look.

    Dmitry Streblechenko (MVP)
    http://www.dimastr.com/redemption
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Saturday, September 9, 2017 11:47 PM
  • Hi Stephen, Dmitry

    I continue researching on MAPI performance on Win10CU because it's vital for our apps.
    I downloaded MS PDB-files from https://developer.microsoft.com/en-us/windows/hardware/download-symbols - Windows 10 and Windows Server 2016 – April 2017
    and I debug MAPIAllocateBuffer in WinDbg
    I'm not expert in debugging in disassembler code so possibly my research is not completely correct. But maybe it will help you to dig deeper.
    The most calls from MAPI32!MAPIAllocateBuffer is more or less clear and I see that they do not have big stack of child calls - for example MSMAPI32!MAPIAllocateBuffer and olmapi32!MAPIAllocateBuffer.
    But there is method which has unreal deep stack of child calls - MAPI32!GetMMGADll.
    I did not manage to go though all this stack but I selected some methods with the most complex of sequence of calls.
    I commented them below (it is not complete stack - just the main calls in my opinion) - maybe you try to research more?


    MAPI32!MAPIAllocateBuffer
    |-ntdll!RtlEnterCriticalSection
    |-MAPI32!GetPolicyProxyDll
      |-MAPI32!AppModelPolicy_GetPolicy_Internal
      |-MAPI32!TlgWrite
      |-MAPI32!GetMMGADll - unreal long stack with thousands child calls from combase
        |-combase!CoIncrementMTAUsage
        |-combase!WindowsCreateStringReference
        |-combase!RoGetActivationFactory - this call from GetMMGADll also has huge stack of calls in comparison with others
          |-combase!RoActivateInstance+0x4200
          |-combase!RoActivateInstance+0x3860
          |-combase!RoActivateInstance+0x4120
          |-combase!CoGetTreatAsClass+0x240 - this call from RoGetActivationFactory also has huge stack of calls in comparison with others

            in some moment I see this stack started from combase!CoGetTreatAsClass+0x36d
            |-combase!Ordinal153+0x4ed5
              |-combase!Ordinal153+0x12cd
                |-combase!ObjectStublessClient32+0xfc32
                  |-combase!NdrExtStubInitialize+0x1b6e
                    |-RPCRT4!NdrpClientCall3+0xc6c
                      |-RPCRT4!NdrpClientUnMarshal+0x1e1
                        |-RPCRT4!NdrPointerUnmarshall+0x167
                          |-RPCRT4!NdrPointerUnmarshall+0x26b
                            |-combase!CoAddRefServerProcess+0xc2d
                              |-combase!CoGetMarshalSizeMax+0x920
                                 |-AppvIsvSubsystems64!VirtualizeCurrentProcess+0xe2b30
                                    |-combase!CoCreateInstance+0xc3
                                      |-combase!RoActivateInstance+0x90ab
                                        |-combase!RoActivateInstance+0x85b6
                                          |-combase!DllGetClassObject+0x46
                                            |-RPCRT4!NdrDllGetClassObject+0x59
                                              |-combase!NdrOleDllGetClassObject+0x16b
                                                |-combase!NdrOleDllGetClassObject+0x6d7
                                                  |-ucrtbase!memcmp+0xc6

      |-MAPI32!TlgWrite
      |-MAPI32!GetProxyDllEx
    |-KERNELBASE!GetProcAddress (for MAPIAllocateBuffer)
    |-ntdll!RtlLeaveCriticalSection
    |-MSMAPI32!MAPIAllocateBuffer
      |-olmapi32!MAPIAllocateBuffer
    |-ntdll!RtlEnterCriticalSection
    |-ntdll!RtlLeaveCriticalSection

    And one more question.
    We have our class

    class COLXBinary : public SBinary

    We use it as SBinary in MAPI calls and we implemented it just to have
    operator=, operator!=, operator LPCTSTR and methods Serialize(CArchive& ar) and Deserialize(CArchive& ar).
    When we create this class in constructor from SBinary we use MAPIAllocateBuffer (and MAPIFreeBuffer in destructor accordingly)

    COLXBinary::COLXBinary(const SBinary& data)
    {
      if(data.cb > 0 && data.lpb)
      {
        HRESULT hr = ::MAPIAllocateBuffer(data.cb, (void**)&lpb);
        if(SUCCEEDED(hr))
        {
          memcpy(lpb, data.lpb, data.cb);
          cb = data.cb;
        }
      }
    }

    So my question is - is it obligatory to use MAPIAllocateBuffer / MAPIFreeBuffer OR it will also work for MAPI calls if we just use operators new / delete

    COLXBinary::COLXBinary(const SBinary& data)
    {
      if(data.cb > 0 && data.lpb)
      {
        lpb = new BYTE[data.cb]; // and delete[] lpb; in destructor
        memcpy(lpb, data.lpb, data.cb);
        cb = data.cb;
      }
    }
    Monday, September 11, 2017 5:31 PM
  • Your own allocator (new / delete) would work just fine. Eben a pointer to a static buffer would work

    There is no requirement to use the MAPI allocator to store your data.


    Dmitry Streblechenko (MVP)
    http://www.dimastr.com/redemption
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Monday, September 11, 2017 5:41 PM
  • Thanks Dmitry,

    Then it will improve our performance. I already tested it.
    BUT problem with MAPIAllocateBuffer on Win10CU exists and I can't avoid this call everywhere.
    It would be good if Stephen pays attention on this problem more closely...


    Monday, September 11, 2017 6:06 PM
  • You might want to open a support case with MS (you get 3 or 4 free support cases a year if you have an MSDN subscription) - this way your problem will definitely get looked at.

    Dmitry Streblechenko (MVP)
    http://www.dimastr.com/redemption
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Monday, September 11, 2017 10:53 PM
  • I got a case on this and definitely reproduced something - 52 s vs 25 ms for 10000 MAPIAllocateBuffer. I'm analyzing now.
    Tuesday, September 12, 2017 9:45 PM
    Answerer
  • Cool! I am really curious what can possibly cause this..

    Dmitry Streblechenko (MVP)
    http://www.dimastr.com/redemption
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Tuesday, September 12, 2017 9:50 PM
  • Found the bug. It's in Windows. I happen to know a few devs who worked on the feature, so I've sent the report over to them along with notes on how to fix it.

    BTW - the perf hit here is on every export from mapi32.dll.

    Tuesday, September 12, 2017 10:29 PM
    Answerer
  • Ouch! Is there any way to mitigate this (load some other dll, patch a particular Win API function etc.)?

    Dmitry Streblechenko (MVP)
    http://www.dimastr.com/redemption
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Tuesday, September 12, 2017 10:37 PM
  • Great!

    Stephen, thank you for your work because we really have big head-ache with this problem. Please let us know what to change after problem is resolved.


    Monday, September 18, 2017 11:08 AM
  • So - as I found earlier, the problem is in Windows. The Windows devs did some digging and think they may have already fixed this problem via some perf work they did a few months ago. The big question then is how/when you get the fix. I'm still chasing that down.
    Monday, September 18, 2017 6:14 PM
  • Hi Stephen,
    do you have some news concerning this performance problem in Windows for mapi32.dll? How can we fix / workaround it?
    It's very important for us...
    Monday, October 2, 2017 6:18 AM
  • Hi Stephen,
    I'm writing again because it's very important for us and our customers.
    Can we download somewhere new DLLs which fix this performance problem?
    Or you can point us some people who we can contact to resolve it faster?
    Thursday, October 5, 2017 6:50 AM
  • Hey- I just wanted to close the loop on this. We did identify the performance problem in 1703. It was due to work that has already been further refined in 1709. We're working on backporting the fixes from 1709 to 1703 and expect them to ship in January. 1709 already has the fixes, so that's an option if you need the fix now.
    Friday, November 3, 2017 3:35 PM
  • Hey- I just wanted to close the loop on this. We did identify the performance problem in 1703. It was due to work that has already been further refined in 1709. We're working on backporting the fixes from 1709 to 1703 and expect them to ship in January. 1709 already has the fixes, so that's an option if you need the fix now.

    Good day

    I'm also a .NET developer and use the MAPI methods to create emails within our existing applications.  Since the W10 Fall update (1709), we too are unable to send emails within our applications.  Error 26 - Not Supported.

    Stumbled on this feed while scouring the internet trying to find answers.

    My dev machine updated to 1703, followed directly by 1709.  So can't comment which update caused the issue, I can only say that I still have the issue on 1709.

    Reading through this feed, am I correct in saying that it is a MS issue and the fix will land in Jan 2018? 

    Any advice will be appreciated.

    BTW.  Dmitry, you Sir, are a legend.  Used your Redemption product in the past ;-)

    Regards

    Wayne

    Thursday, November 30, 2017 6:51 AM
  • Hey- I just wanted to close the loop on this. We did identify the performance problem in 1703. It was due to work that has already been further refined in 1709. We're working on backporting the fixes from 1709 to 1703 and expect them to ship in January. 1709 already has the fixes, so that's an option if you need the fix now.

    Apologies for resurrecting an old thread but we are seeing similar issues with mapi32 on 1703. Did the fixes in 1709 ever get 'backported' and if so do you know what the KB is?

    Many thanks.

    Tuesday, September 18, 2018 3:42 PM