locked
CreateFileMapping; possible memory leak? RRS feed

  • Question

  • My program uses CreateFileMapping so multiple processes can access a common data base.  The program runs fine and produces the correct answers.  But when all is finished, sometimes the Windows OS has crashed.  Other programs won't load, and the Power | Restart button doesn't work.  I have to run shutdown.exe and then cold start the PC.

    I UnmapViewOffile for each mapping I create, and then close the handle.  Here is the code.  Did I leave out anything?

    #pragma optimize("g", off )
    // core 0 comes here to get addresses of arrays for other cores

    void preparederivmap( int &npas )    // create mapping of data for other cores
    {
        int size = 15001*412 + 128;

        TCHAR derivfilename[]= TEXT("SYNOPSYS_DERIV_DATA");

        hMapping = CreateFileMapping( NULL ,nullptr, PAGE_READWRITE, 0, (size*sizeof(double)), derivfilename );
        if (hMapping == NULL ) {
            panic( "Cannot open mapping file" );
        }

        double *deriv = static_cast<double*> (MapViewOfFile(hMapping, FILE_MAP_ALL_ACCESS, 0, 0, 0 ));
        if ( !deriv ) {
            panic( "Cannot do MapView" );
        }
        pderiv = deriv;

        size = 401;
        TCHAR delqfilename[]= TEXT("SYNOPSYS_DELQ_DATA");

        hMapping2 = CreateFileMapping( NULL ,nullptr, PAGE_READWRITE, 0, (size*sizeof(double)), delqfilename );
        if (hMapping2 == NULL ) {
            panic( "Cannot open mapping file" );
        }

        double *delq = static_cast<double*> (MapViewOfFile(hMapping2, FILE_MAP_ALL_ACCESS, 0, 0, 0 ));
        if ( !delq ) {
            panic( "Cannot do MapView" );
        }
        pdelq = delq;

        TCHAR scdrfilename[]= TEXT("SYNOPSYS_SCDR_DATA");

        hMapping3 = CreateFileMapping( NULL ,nullptr, PAGE_READWRITE, 0, (size*sizeof(double)), scdrfilename );
        if (hMapping3 == NULL ) {
            panic( "Cannot open mapping file" );
        }

        double *scdr = static_cast<double*> (MapViewOfFile(hMapping3, FILE_MAP_ALL_ACCESS, 0, 0, 0 ));
        if ( !scdr ) {
            panic( "Cannot do MapView" );
        }
        pscdr = scdr;

        SYNORUN( deriv, delq, scdr, &npas );    // returns only when all passes are done
        return;
    }


    void killderivmap( )    // delete mapping of data for other cores
    {

        UnmapViewOfFile( pderiv );
        UnmapViewOfFile( pdelq );
        UnmapViewOfFile( pscdr );

        CloseHandle( hMapping );
        CloseHandle( hMapping2 );
        CloseHandle( hMapping3 );

    }

    Saturday, May 16, 2020 1:12 PM

Answers

  • I suggest you change your code to unconditionally unmap non-null views and close valid handles upon process termination.

    I don't know what this accomplishes -

       if ( dsMultiMe > 0 ) {        // higher cores 

    I suggest you  remove the shared memory cleanup from this conditional test use something like the following for each mapped view and related file mapping object handle -

    if(pMappedView)
      UnMapViewOfFile(pMappedView);
    
    if(hFileMappingObject)
      CloseHandle(hFileMappingObject);

    • Marked as answer by DonDilworth Thursday, June 4, 2020 11:59 AM
    Thursday, June 4, 2020 11:33 AM

All replies

  • My program uses CreateFileMapping so multiple processes can access a common data base.  The program runs fine and produces the correct answers.  But when all is finished, sometimes the Windows OS has crashed.  Other programs won't load, and the Power | Restart button doesn't work.  I have to run shutdown.exe and then cold start the PC.

    I UnmapViewOffile for each mapping I create, and then close the handle.  Here is the code.  Did I leave out anything?

    ........


        hMapping = CreateFileMapping( NULL ,nullptr, PAGE_READWRITE, 0, (size*sizeof(double)), derivfilename );

    For what its worth you are calling CreateFileMapping incorrectly.  If you want shared memory backed by the page file then the first parameter should be INVALID_HANDLE_VALUE, not NULL.

    In any event, Windows will clean up all resources and close handles when a process terminates.

    And there is nothing about the description of the problem to indicate that CreateFileMapping is responsible.

    It's not clear what you mean by "the Windows OS has crashed."  Did it give you a BSOD? 

    What does "other programs won't load" mean?  Are there error messages?  Event log entries?  How can you run shutdown.exe?

    You might post some more details about your environment (e.g., Windows version, Antivirus, etc.) in the hope that someone else has experienced a similar problem.


    Saturday, May 16, 2020 2:37 PM
  • I tried your suggestion, INVALID_HANDLE_VALUE, and ran the program, looping 25 times running 16 processes each time.  Same results.

    If I open Thunderbird, I get a black screen.  If I copy some text from Word, when I paste it somewhere nothing shows up.  No error messages.

    The Event Viewer shows an error 455 ESENT, but that was from the previous session.

    I have Windows 10, 2.68 GH CPU with 56 cores, NVIDIA Quadro P4 graphics, Windows updated yesterday.

    I have SUPERAntispyware free edition, and Malwarebytes Premium.

    I run shutdown.exe from an icon I created on the desktop.  Very handy.

    What else can I tell you?

    Saturday, May 16, 2020 4:43 PM
  • Which Windows 10 release (e.g., 1607, 1709, 1803, 1809, 1903, 1909) are you running?

    Have you looked in Task Manager to make sure that all of the processes you create have terminated?

    Any known problems with the graphics driver you are using?

    Tried running without AV software?

    Saturday, May 16, 2020 5:01 PM
  • Windows version 10.0.18363.

    Yes, all processes have terminated.  No known problems with graphics.  AV?  Will test.

    Saturday, May 16, 2020 5:33 PM
  • I have tested it with both AVs turned off.  Same crash.  What do you suggest now?
    Saturday, May 16, 2020 6:58 PM
  • Did you Install any Windows Updates for the system this month?

    Saturday, May 16, 2020 7:04 PM
  • Yes.  Windows updates itself regularly, and did so yesterday.
    Saturday, May 16, 2020 7:17 PM
  • Well, if everything was fine until you installed updates you can try uninstalling them to see if a misbehaving update is the culprit.

    Or wait to see if any other suggestions are posted.

    Saturday, May 16, 2020 8:33 PM
  • This problem has been going on for many months.  Reverting is not going to help.
    Sunday, May 17, 2020 11:15 AM
  • Interesting news:  I ran the same program data sequence on a different PC (generic, 8-core, W10) and there was no failure.  Running it on the Dell (56 cores, W10) it fails every time.  The last failure shows this in the event viewer:

    svchost (10032,R,98) TILEREPOSITORYS-1-5-18: Error -1023 (0xfffffc01) occurred while opening logfile C:\WINDOWS\system32\config\systemprofile\AppData\Local\TileDataLayer\Database\EDB.log.

    Does that tell you anything?  Or should I call Dell service?

    Sunday, May 17, 2020 11:38 AM
  • I'm not a technician but the described issues sound like a system problem to me.

    The particular event log entry doesn't mean much to me, but S-1-5-18 is the SID for the SYSTEM account.

    You can try asking questions in the Microsoft user forums (answers.microsoft.com) since this doesn't seem like a development issue.

    You'll probably get advice like "run the system file checker" and so forth.

      https://support.microsoft.com/en-us/help/929833/use-the-system-file-checker-tool-to-repair-missing-or-corrupted-system

    Calling Dell is up to you.  You can always call them after checking the advice available from other online references that you search.

    Sorry that I can't be more helpful.

    • Marked as answer by DonDilworth Sunday, May 17, 2020 12:46 PM
    • Unmarked as answer by DonDilworth Tuesday, June 2, 2020 4:24 PM
    Sunday, May 17, 2020 11:53 AM
  • Actually, that is helpful, thank  you.

    And I have to update my previous note: I ran the sequence on the second PC (which had not failed) and after several iterations it too failed the same way.  So it's not a Dell problem.  I'll see what I can find out from Microsoft.

    Sunday, May 17, 2020 12:25 PM
  • More info that might be useful:   I see this message in the Event Viewer.  Could this point to the problem?

    The required buffer size is greater than the buffer size passed to the Collect function of the "C:\Windows\System32\perfts.dll" Extensible Counter DLL for the "LSM" service. The given buffer size was 14128 and the required size was 35432.

    If so, is there anything in my software that would cause or cure it?

    Tuesday, June 2, 2020 4:26 PM
  • I have no idea if this error is at all related to your issues.

    For suggestions on how to handle the error, see

    https://answers.microsoft.com/en-us/windows/forum/all/event-1020-windows-10/9f1103be-d81f-4912-a9e3-4c1cf6026039

    Tuesday, June 2, 2020 5:27 PM
  • It may not be related.  I did the operations on your link, and nothing changes.

    Is there any utility that will monitor the system memory and flag when a program tries to clobber something else?  That seems to be what is happening.  But the Event Viewer displays no error messages from that timeframe.

    Tuesday, June 2, 2020 6:38 PM
  • Have you tried testing after starting Windows in safe mode?

    https://support.microsoft.com/en-us/help/12376/windows-10-start-your-pc-in-safe-mode

    Tuesday, June 2, 2020 6:59 PM
  • That's a great idea.  I tried it, ran the program, and got the same crash.

    The crash depends on how many times I run a certain feature.  That feature cycles a number of times, usually 10, and each cycle creates the four CreateFileMapping files.  So if I ask for 24 processes, creating and destroying the files 10 times, and then repeat that whole sequence a second time, the crash happens.

    If I ask for 16 processes, I can run the procedure four times, maybe more, with no crash.

    I will look into some of the other parameters in the CreateFileMapping operation.  The files can be very large, and that could be the problem.

    Tuesday, June 2, 2020 7:26 PM
  • The usage of CreateFileMapping that you posted is for shared memory that is backed by the system's page file.  The posted code indicates indicates a maximum size for a file mapping object should be a little less than 50 MB.  I wouldn't think that creating  4 of such file mapping objects should be a problem
    Tuesday, June 2, 2020 7:41 PM
  • I tried the

    SIZE_T GetLargePageMinimum();

    feature, but couldn't get the include files right.  That would let me use the SEC_LARGE_PAGES option.

    But my largest file size is only 6.18 mb, which works out to 12.36 mb of double-precision floating-point numbers.  The other two files are much smaller.

    The fact that I can run this a few times with no trouble--but it fails if run too many times--says that maybe the mapped files are not released properly.  When the process completes, I use

        UnmapViewOfFile( pderiv );
        UnmapViewOfFile( pdelq );
        UnmapViewOfFile( pscdr );

        CloseHandle( hMapping );
        CloseHandle( hMapping2 );
        CloseHandle( hMapping3 );

    to close them down.  Is there anything else I should do to make them release whatever resources they have grabbed?  If they are still hanging around, then too many of them would indeed be a problem.

    It is also possible I have the wrong values for the file size.

            int size = 15001*412 + 128;
            hMapping = CreateFileMapping( INVALID_HANDLE_VALUE ,nullptr, PAGE_READWRITE, 0, (size*sizeof(double)), derivfilename );

    Perhaps I don't have the low and high orders of a DWORD constructed properly.  Do you see an error there?


    • Edited by DonDilworth Tuesday, June 2, 2020 8:02 PM new data
    Tuesday, June 2, 2020 7:52 PM
  • Unmapping a view and closing the related file mapping object handle should be enough.  And as I pointed out earlier, Windows will release all resources upon process termination and you previously said that all of the processes you create that use the shared memory do terminate.

    Tuesday, June 2, 2020 8:01 PM
  • It is also possible I have the wrong values for the file size.

            int size = 15001*412 + 128;
            hMapping = CreateFileMapping( INVALID_HANDLE_VALUE ,nullptr, PAGE_READWRITE, 0, (size*sizeof(double)), derivfilename );

    Perhaps I don't have the low and high orders of a DWORD constructed properly.  Do you see an error there?


    This is the code that requests a maximum size of slightly more than 47 MB.  And the usage looks correct to me.
    Tuesday, June 2, 2020 8:12 PM
  • Well, I am out of ideas.  Something is clearly wrong.  Are there any utilities that can check for writing into system files that should be protected?  What else could be going on?

    The Task Manager | Performance shows moderate and fairly constant CPU and memory usage.  There are no errors listed in Event Viewer.  This is spooky.  If it were 31 October I could accept it, but not today.

    Tuesday, June 2, 2020 8:26 PM
  • Maybe the higher processes are not exiting correctly and not releasing resources.  Here is the code that gets called when they are finished:

    int CSYNOPSYSApp::ExitInstance()
    {
        if ( dsMultiMe > 0 ) {        // higher cores don't save other stuff
            HideApplication();
            CloseAllDocuments( TRUE );
            free ( crypticbuffer );  // from malloc
            delete myMf;
            delete m_pMainWnd;

            return CWinApp::ExitInstance();
        }

    Is there something else should go in there?

    Tuesday, June 2, 2020 8:31 PM
  • Does each process that intends to use shared memory call CreateFileMapping/MapViewOfFile for the same named objects and UnMapViewOfFile/CloseHandle when finished?
    • Edited by RLWA32 Tuesday, June 2, 2020 8:50 PM
    Tuesday, June 2, 2020 8:50 PM
  • No.  Each process calls OpenFileMapping() and MapView Of File() to access the files created by process 0.  The higher processes do not unmap anything when they exit.  The mapped files are left there for when process 0 has to run another set of data and spawn another set of processes.  The new processes therefore continue to use the same mapped files.  That saves time since the mapping is only done once.  When process 0 finally exits, then it unmaps and closes the handles.

    Should the other processes unmap anything?  Wouldn't that delete the files I want to share on the next set of data?

    A related question:  The size of my largest array is 6180540 doubles, or over 49 mb.  Shouldn't I use SEC_LARGE_PAGES ?

    If so, I have to call GetLargePageMinimum, but that requires an extra dll that I don't know how to access.  Please advise.

    Wednesday, June 3, 2020 3:54 PM
  • As I said before, the shared memory created by your code is backed by the paging file.  You are NOT creating new files in the file system.

    As a practical matter there is little difference between calling OpenFileMapping and CreateFileMapping for an already existing named file mapping object.  Both functions will return a handle to the existing object.

    The system maintains reference counts on the objects associated with shared memory.  Your code should call UnMapViewOfFile and CloseHandle in each process that has used the shared memory when the process terminates. These functions decrement reference counts The file mapping object will not  be closed by the system until the last reference to it has been released. 

    You don't need to use large pages, and 50MB is not a significant amount of virtual address space.

    Wednesday, June 3, 2020 5:40 PM
  • A new crash, and this time the EV shows something:

    The required buffer size is greater than the buffer size passed to the Collect function of the "C:\Windows\System32\perfts.dll" Extensible Counter DLL for the "LSM" service. The given buffer size was 31728 and the required size was 36720.

    I don't know if this is related, but the time is about right.

    It was followed a few minutes later with

    Faulting application name: SYNOPSYS200v15.exe, version: 1.0.0.1, time stamp: 0x5ed7d7c0
    Faulting module name: SYNOPSYS200v15.exe, version: 1.0.0.1, time stamp: 0x5ed7d7c0
    Exception code: 0xc0000005
    Fault offset: 0x002cfe2b
    Faulting process id: 0x4b20
    Faulting application start time: 0x01d639c8de4d3812
    Faulting application path: C:\SYNOPSYSV15\Release\SYNOPSYS200v15.exe
    Faulting module path: C:\SYNOPSYSV15\Release\SYNOPSYS200v15.exe
    Report Id: 18e94d9e-44dc-4861-b084-119c0d195237
    Faulting package full name:
    Faulting package-relative application ID:

    How do I diagnose this?

    Wednesday, June 3, 2020 5:40 PM
  • Exception C0000005 is an Access Violation.

    These are typically cause by dereferencing a NULL pointer or using a pointer that contains an invalid address.

    Wednesday, June 3, 2020 5:46 PM
  • I never would have thought of that, thank you.

    So I added some code to the ExitInstance()

    int CSYNOPSYSApp::ExitInstance()
    {
        if ( dsMultiMe > 0 ) {        // higher cores don't save other stuff

        UnmapViewOfFile( pderiv );
        UnmapViewOfFile( pdelq );
        UnmapViewOfFile( pscdr );

        CloseHandle( hMapping );
        CloseHandle( hMapping2 );
        CloseHandle( hMapping3 );



            HideApplication();
            CloseAllDocuments( TRUE );
            free ( crypticbuffer );  // from malloc
            delete myMf;
            delete m_pMainWnd;

            return CWinApp::ExitInstance();
        }

    But the problem didn't go away.

    It depends on the number of processes I run.  If I ask for 64 and cycle the program 10 times, it creates and exits from each of them 10 times, or 640 total spawns.  That always crashes.

    If I ask for 32 processes and cycle it 10 times, I can do that whole sequence nine times with no crash.  on the 10th run, it crashes.  Does that tell you anything?

    I can ask for 16 processes, and it is still good after 12 iterations.  I did not test more than that.

    Wednesday, June 3, 2020 7:00 PM
  • Since this is not converging, let's start from the other end.  What kind of screwup would make the Power | Restart gadget stop working?  Answer that and we have a clue.  Then we look for how my code could have done just that.  The mouse still works; I can click the Start button and the Power button.  But after the crash, the Restart option is unresponsive.

    Do you have enough insight into Windows to clear that up?

    Wednesday, June 3, 2020 8:08 PM
  • Since this is not converging, let's start from the other end.  What kind of screwup would make the Power | Restart gadget stop working?  Answer that and we have a clue.

    Sorry, but my psychic powers are not up to this task. 

    Wednesday, June 3, 2020 9:41 PM
  • Well, a first step would be to obtain a dump from the crash and use the debugger to examine the call stack.

    Does that help you?

    Wednesday, June 3, 2020 9:50 PM
  • I'm not sure what this means.  My program does not crash.  Windows keeps running.  But some features of Windows no longer work.  There are no errors in the Event Viewer.  So what can I dump and look at, and how would I know if something is amiss?

    A colleague suggested that the page file gets corrupted.  Does that suggest an approach to find a lead?

    Thursday, June 4, 2020 11:09 AM
  • Did you change your code so that upon termination EVERY process that uses the shared memory calls UnMapViewOfFile to unmap its view and CloseHandle on its file mapping object handle?
    Thursday, June 4, 2020 11:18 AM
  • Yes, I did.  Every process, when it terminates, does this:

    int CSYNOPSYSApp::ExitInstance()
    {
        if ( dsMultiMe > 0 ) {        // higher cores don't save other stuff

            if ( pderiv != NULL ) {
                UnmapViewOfFile( pderiv );
                UnmapViewOfFile( pdelq );
                UnmapViewOfFile( pscdr );

                CloseHandle( hMapping );
                CloseHandle( hMapping2 );
                CloseHandle( hMapping3 );
            }

            HideApplication();
            CloseAllDocuments( TRUE );
            free ( crypticbuffer );  // from malloc
    //        endLens();            // release this license; no; let core 0 do it
            delete myMf;
            delete m_pMainWnd;

            return CWinApp::ExitInstance();
        }

    Thursday, June 4, 2020 11:22 AM
  • I suggest you change your code to unconditionally unmap non-null views and close valid handles upon process termination.

    I don't know what this accomplishes -

       if ( dsMultiMe > 0 ) {        // higher cores 

    I suggest you  remove the shared memory cleanup from this conditional test use something like the following for each mapped view and related file mapping object handle -

    if(pMappedView)
      UnMapViewOfFile(pMappedView);
    
    if(hFileMappingObject)
      CloseHandle(hFileMappingObject);

    • Marked as answer by DonDilworth Thursday, June 4, 2020 11:59 AM
    Thursday, June 4, 2020 11:33 AM
  • My comment about obtaining a dump was targeted to your post about your application crashing with an Access Violation.

    if you have every process that uses shared memory properly unmap its views and close its handles I would think that would address the possibility of a problem with the page file.

    Thursday, June 4, 2020 11:44 AM
  • Fantastic!  I ran a sequence that always failed, and it worked!  I think you found the solution.  Never would have guessed.  Good job.
    Thursday, June 4, 2020 12:01 PM
  • Its nice to hear some good news.  I've got my fingers crossed that nothing else pops up. :)
    Thursday, June 4, 2020 12:07 PM
  • Something did.

    I can run my example with 64 cores and cycle accessing the mapped files four times.  No problem.  But it fails after the fifth cycle.  Each cycle reads and writes the mapped files 5 times.

    But if I run data that reads and writes 10 times per cycle, it works once and fails on the second cycle.

    And if I read and write 64 times each cycle, it fails after the first one.

    This makes no sense at all.  I see nothing in the code that would affect the mapped files.  They are read and written more times, but that should cause no problem.  Yet it does.  Ever more weird.

    Thursday, June 4, 2020 3:06 PM
  • To make things clear, there are four nested loops.

    1. Loop I times

       2. loop J times

         3. loop N*64/M times

            4. loop N times

    The first pass in loop 1 sets up file mapping via CreateFileMapping.

    Loop 2 spawns M processes.  Each one runs OpenFileMapping once to get the handles, then starts loop 3, which reads and writes to the mapped arrays N*64/M times and loops N times, each one reading and writing as above.  As each pass of loop 2 finishes, all processes unmap the files, close the handles, and terminate, except process 0.  Then loop 2 loops again, repeating the above, spawning and so on.  When loops 2, 3, and 4 are done, loop 1 starts the next iteration, which repeats all that a total of I times.

    Here are the results:

    I = 4

    J = 10

    M = 64

    N = 5

    I = 4; no problem.  I = 5 crashes Windows


    I = 1

    J = 10

    M = 64

    N = 10

    I = 1; no problem.  I = 2 crashes Windows

    The second case spawns 128 processes in total, while case 1 spawns 320.  Yet case 2 crashes before case 1.  The difference seems to be the value of N, which governs the number of reads and writes to the mapped files.  Why should that make a difference?


    • Edited by DonDilworth Thursday, June 4, 2020 4:00 PM clarification
    Thursday, June 4, 2020 3:13 PM
  • When you say "fail" are you referring to the same anomalous behavior previously described or is this something different?

    Are you creating/opening file mapping objects and mapping views inside or outside the nested loops?

    Is it possible that you are overwriting variables that hold handles or mapped views during loop execution?

    I assume that when you say "read and write the mapped files" you are talking about only using the mapped views as pointers to memory.  Is that correct?

    I suggest you add code to test the return values from UnMapViewOfFile and CloseHandle to ensure that they are succeeding.


    • Edited by RLWA32 Thursday, June 4, 2020 3:48 PM
    Thursday, June 4, 2020 3:48 PM
  • I create the file mapping only once.  Then the spawned processes open the mapped files and read and write many times. The handles are initialized by the OpenFileMapping() and pointers to the mapped files are defined then.  Those pointers are sent as arguments to Fortran programs that do the reading and writing to the files.  They are used as the names of arrays with the same dimensions as on the created files.

    By "fail", I mean that after my code finishes all its cycles, if I try to open Thunderbird, I see a black screen.  If I close that and try to open it again, there is no response.  And the Start | Power | Restart gadget is unresponsive.  Also, the copy/paste process in Word then does nothing.  I have to do a cold boot. 

    I will test the return values.  Good idea.

    Thursday, June 4, 2020 4:21 PM
  • I create the file mapping only once.  Then the spawned processes open the mapped files and read and write many times. The handles are initialized by the OpenFileMapping() and pointers to the mapped files are defined then.  Those pointers are sent as arguments to Fortran programs that do the reading and writing to the files.  They are used as the names of arrays with the same dimensions as on the created files.

    I don't use Fortran and am not familiar with the details of how you would interoperate with it from C/C++.  However, the address of shared memory that is obtained in a process by using OpenFileMapping/MapViewOfFile is a virtual address that is only valid within that process.

    So are the Fortran programs different processes from the ones that obtain the shared memory address?

    Thursday, June 4, 2020 4:31 PM
  • The spawned processes open the mapped files, then send them to Fortran as arguments to a subroutine call.  That subroutine reads and writes those data as though they were ordinary Fortran data arrays.  All the processes write to the same arrays, and when they have done their work, the Fortran does a lot of processing of the data.  When the Fortran has completed its work and the other processes terminate, it loops and spawns again, the new processes open the mapped files again, and the job continues as above.

    I added a test to the  unmap and closehandle code; there were no failures.  But the Windows crash is still there.

    Thursday, June 4, 2020 4:50 PM
  • If I understand you correctly CreateFileMapping/OpenFileMapping/CloseHandle and MapViewOfFile/UnMapViewOfFile occur only once per process for each file mapping object that the code uses.

    At this point I suggest you create as simple a demo as possible to reproduce the use (reading and writing) of shared memory with multiple processes/iterations using minimal C/C++ code.

    If you can reproduce the problem with a minimal C/C++ demo you can use it to open a support case with Microsoft.

    If you can't reproduce the problem then its possible the issue is related to how the Fortran code is manipulating the shared memory.

    At this point I just don't know what else to try.

    Thursday, June 4, 2020 5:04 PM
  • Are you absolutely certain that there is no reading/writing of shared memory outside the bounds of the maximum size that is specified when the file mapping object is created?
    Thursday, June 4, 2020 5:08 PM
  • Not quite.  The CreateFileMapping happens only once.

    The OpenFileMapping happens with every spawned process.  Each of those unmaps and closes the handles when it terminates.

    Simple case?  That's a tall order.  The code in question is somewhere around half a million lines.  I would be happy to send a package to Microsoft, under an NDA, but I doubt they'd be interested.

    The Fortran code has declared array sizes, the same as the sizes of the mapped files, and the debugger would scream if those were exceeded.  I have attached some of the spawned processes as a test so the debugger could flag errors there.  None were found.

    This is beginning to look like a Windows 10 glitch.  Do you have a contact there who is willing to listen?

    Thursday, June 4, 2020 5:14 PM
  • I don't have any contacts within Microsoft.

    Does the problem manifest with both debug and release builds of your executables?

    Thursday, June 4, 2020 6:31 PM
  • Yes.  Same crash from running the same data.

    Is there anything in the spawn process that looks suspicious?  Can I try other modes, if there are any?

    void myspawn( int i, int j, int k, char* path )
    {
        CString imsg;
        CString jmsg;
        CString kmsg;
        CString tmsg;
        CString lmsg;
        int l;

    /* task numbers in j:
        1 DSEARCH Q
        2 GSEARCH
        3 MC
        4 REDO DSEARCH Q
        5 ZSEARCH Q
        6 REDO ZSEARCH Q
        7 AEI
        8 AAA
        9 ADA
        10 SYNO | spawn
    */
    //    qnx_spawn_options.priority = 20;    // this won't work

        imsg.Format( "%d", i );        // is new core number
        jmsg.Format( "%d", j );        // is task number
        kmsg.Format( "%d", k );        // is network flag
        tmsg.Format( "%d", frametop );    // data set by core 0
        lmsg.Format( "%d", frameleft );
            
        l = spawnl( _P_NOWAIT, path, "SYNOPSYS200v15.exe", imsg, jmsg, kmsg, tmsg, lmsg, NULL );
        if ( l < 0 ) {
            errno_t err = 0;

            _get_errno( &err );
            switch (err) {
            case 0:
                AfxMessageBox( "TOO MANY ARGUMENTS" );
                wndout( "TOO MANY ARGUMENTS\r\n" );
                break;
            case 1:
                AfxMessageBox( "INVALID MODE" );
                wndout( "INVALID MODE\r\n" );
                break;
            case 2:
                AfxMessageBox( "EXECUTABLE FILE NOT FOUND" );
                wndout( "EXECUTABLE FILE NOT FOUND\r\n" );
                break;
            case 3:
                AfxMessageBox( "NOT EXECUTABLE" );
                wndout( "NOT EXECUTABLE\r\n" );
                break;
            case 4:
                AfxMessageBox( "NOT ENOUGH MEMORY" );
                wndout( "NOT ENOUGH MEMORY\r\n" );
                break;
            }
        }
        corehandle[i-1] = l;
    }


    Thursday, June 4, 2020 8:43 PM
  • I expect the compiler is complaining that you should call _spawnl instead of spawnl.

    The return value from spawnl should be typed as an intptr_t instead of an int.  This makes a difference if you are working with 64 bit code.

    Check the values you are using for error numbers in the switch statement.  I think some of them are incorrect.  Anyway, you should use the symbolic names like E2BIG, EINVAL and so forth instead of hard-coded integer values.

    So much for ministerial details.

    The asynchronous version of spawnl returns a handle to the newly created process.  I don't know the purpose of this -

    corehandle[i-1] = l;
    But the kernel's process object is not destroyed until the handle that your program has received is closed.  So I suggest you call CloseHandle on the return value from spawnl.

    Friday, June 5, 2020 12:51 AM
  • Interesting suggestions.

    The symbolic names are mysterious to me, and I don't know their mnemonics.  None of those errors have ever been triggered, so the point is moot.

    corehandle[] is how my program keeps track of what processes have been spawned.  Then, if the user kills the whole job while they are still running, it can close the handle then.

    I never thought of closing the handles while the processes are still running.  Won't that kill them?  I want all processes to be running in parallel.

    Friday, June 5, 2020 12:59 PM
  • No, closing the handle will not kill the child process.
    Friday, June 5, 2020 1:24 PM
  • Okay, I added code to close the handles.  But it still crashes the same way.  I have sent a report to the Windows Feedback hub.  I don't see anything else we can do at this point, and thank you for your many wise words.

    Now it's up to Microsoft.

    Friday, June 5, 2020 2:06 PM
  • Are you able to test on Windows 8.1, 8 or 7?

    It might be instructive to see if this is just a Win 10 problem.

    Friday, June 5, 2020 2:42 PM
  • I don't have a working Windows 7 or 8 PC anymore.  Sorry.
    Friday, June 5, 2020 2:44 PM
  • I tried your suggestion about CloseHandle.  The code below crashes with a message

    Cannot create a file when that file already exists.

    And this is on the CLoseHandle!  It is not supposed to create anything.  What's going on?

    void myspawn( int i, int j, int k, char* path )
    {
        CString imsg;
        CString jmsg;
        CString kmsg;
        CString tmsg;
        CString lmsg;
        int l;
        bool retlog;

    /* task numbers in j:
        1 DSEARCH Q
        2 GSEARCH
        3 MC
        4 REDO DSEARCH Q
        5 ZSEARCH Q
        6 REDO ZSEARCH Q
        7 AEI
        8 AAA
        9 ADA
        10 SYNO | spawn
    */
    //    qnx_spawn_options.priority = 20;    // this won't work

        imsg.Format( "%d", i );        // is new core number
        jmsg.Format( "%d", j );        // is task number
        kmsg.Format( "%d", k );        // is network flag
        tmsg.Format( "%d", frametop );    // data set by core 0
        lmsg.Format( "%d", frameleft );
            
        HANDLE hd;
        l = _spawnl( _P_NOWAIT, path, "SYNOPSYS200v15.exe", imsg, jmsg, kmsg, tmsg, lmsg, NULL );
        if ( l < 0 ) {
            errno_t err = 0;

            _get_errno( &err );
            switch (err) {
            case E2BIG:
                AfxMessageBox( "TOO MANY ARGUMENTS" );
                wndout( "TOO MANY ARGUMENTS\r\n" );
                break;
            case EINVAL:
                AfxMessageBox( "INVALID MODE" );
                wndout( "INVALID MODE\r\n" );
                break;
            case ENOENT:
                AfxMessageBox( "EXECUTABLE FILE NOT FOUND" );
                wndout( "EXECUTABLE FILE NOT FOUND\r\n" );
                break;
            case ENOEXEC:
                AfxMessageBox( "NOT EXECUTABLE" );
                wndout( "NOT EXECUTABLE\r\n" );
                break;
            case ENOMEM:
                AfxMessageBox( "NOT ENOUGH MEMORY" );
                wndout( "NOT ENOUGH MEMORY\r\n" );
                break;
            }
        }
        corehandle[i-1] = l;

        hd = (HANDLE) corehandle[i-1];        // this crashes.  Don't know why.
        retlog = CloseHandle(hd);
        if ( retlog ) {
            flasherror();
        }
    }

    void flasherror( void )
    {
        const char* lpMsgBuf;
    //    LPVOID lpMsgBuf;    // this is the example, but it doesn't work!
        FormatMessage(
            FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM,
            NULL,
            GetLastError(),
            MAKELANGID( LANG_NEUTRAL, SUBLANG_DEFAULT ),
            (LPTSTR) &lpMsgBuf,
            0,
            NULL
        );
        
        MessageBox( NULL, lpMsgBuf, "Error Message Flagged", MB_OK|MB_ICONINFORMATION );
    //    LocalFree( lpMsgBuf );    /ditto!
    }

    Tuesday, June 30, 2020 6:41 PM
  •     hd = (HANDLE) corehandle[i-1];        // this crashes.  Don't know why.

        retlog = CloseHandle(hd);
        if ( retlog ) {
            flasherror();
        }
    }

    void flasherror( void )
    {
        const char* lpMsgBuf;
    //    LPVOID lpMsgBuf;    // this is the example, but it doesn't work!
        FormatMessage(
            FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM,
            NULL,
            GetLastError(),
            MAKELANGID( LANG_NEUTRAL, SUBLANG_DEFAULT ),
            (LPTSTR) &lpMsgBuf,
            0,
            NULL
        );
        
        MessageBox( NULL, lpMsgBuf, "Error Message Flagged", MB_OK|MB_ICONINFORMATION );
    //    LocalFree( lpMsgBuf );    /ditto!
    }

    This code is mistakenly calling the flasherror function when CloseHandle succeeds!  If CloseHandle fails it returns 0.  So any error message that you retrieve is irrelevant since it relates to some unknown function that previously set the thread's last error code.

    As far as the flasherror function is concerned, include the flag FORMAT_MESSAGE_IGNORE_INSERTS, and by commenting out the call to LocalFree you are creating a memory leak.

    Tuesday, June 30, 2020 9:18 PM
  • When I do it this way:

    void flasherror( void )
    {
    //    const char* lpMsgBuf;
        LPVOID lpMsgBuf;    // this is the example, but it doesn't work!
        FormatMessage(
            FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
            NULL,
            GetLastError(),
            MAKELANGID( LANG_NEUTRAL, SUBLANG_DEFAULT ),
            (LPTSTR) &lpMsgBuf,
            0,
            NULL
        );
        
        MessageBox( NULL, lpMsgBuf, "Error Message Flagged", MB_OK|MB_ICONINFORMATION );
        LocalFree( lpMsgBuf );    //ditto!
    }

    Error    1    error C2664: 'MessageBoxA' : cannot convert parameter 2 from 'LPVOID' to 'LPCSTR'    c:\synopsysv15\csubs.cpp    836    1    SYNOPSYS200

    Wednesday, July 1, 2020 11:54 AM
  • If you want to use a void* pointer variable to hold the pointer returned by FormatMessage then you should cast the pointer to const char* (or LPCSTR) in the call to MessageBoxA.

    FormatMessage resolves to either FormatMessageW for a unicode build or FormatMessageA for a non-unicode build.  So using LPTSTR in a call to FormatMessage allows for the proper resolution of the data type during pre-processing.  

    Its not a good idea to mix use of the generic data types like LPTSTR with code that requires a particular character set like MessageBoxA. 

    Wednesday, July 1, 2020 12:28 PM
  • Okay, I use const char*, and it compiles.

    But the underlying problem is something else.  Here are the symptoms:

    This is a report about a bug in Windows 10.

    My program uses CreateFileMapping() to map three files to the page file.

    Then it launches many new processes, with _spawnl().

    Each process accesses the mapped files via MapViewOfFile().

    Each process reads and writes to those files many times.

    When each process finishes, it does UnmapViewOfFile() and CloseHandle().

    The calculations are all correct, there are no error messages, the debugger finds no problems, and there are no errors listed in the Event Viewer.

    But after running this program, Windows has become corrupted.  If I launch Thunderbird, I see a black screen.  If I close and launch it again, there is no response. The Task Manager says that TB is still running but disabled.

    The cut and paste option in Word no longer works.  The Start | Power | Restart sequence does nothing.  I have to do a cold reboot.

    After the reboot, Windows works normally.

    Here's the weird part:  When I run my code, I can adjust the number of times the processes read and write to the mapped files.  If I do it a few times, no problem.  If I do it hundreds of times, the problem shows up.  But in both cases, the files are mapped only once.  The difference is in how many times they are read and written, and also how many processes are spawned and terminated.  The Task Manager shows memory usage of about 14% at most.

    This is a serious problem.  I hope somebody can figure out what is the cause and the cure.

    Wednesday, July 1, 2020 12:44 PM
  • I still think the next step is for you to contact Microsoft directly and open a support case.

    Maybe you can work with them to provide them what they need to help you while also protecting your intellectual property.

    But you'll never know unless you try.

    Best of luck to you.

    Wednesday, July 1, 2020 1:13 PM
  • I have tried.  Many times.  I can get their chat person online, explain the problem, and he says he will promote the issue to a higher level and they will contact me.

    It has never happened.

    Does Microsoft care?

    Wednesday, July 1, 2020 1:20 PM
  • I'm not talking about end-user support which is what I expect you are encountering with their online chat.

    I'm talking about professional support for business developers.

    Check out https://support.microsoft.com/en-us/hub/4343728/support-for-business

    Wednesday, July 1, 2020 1:48 PM
  • Today I have new information.  The program runs fine as long as all the spawned processes remain running.  I can end them with the Task Manager and all is still well.  The OS still runs as it should.

    But if my program ends the spawned processes itself, the crash occurs.  I suspect a memory leak, but don't know how to find it.  When each process is supposed to terminate, I unmap the files with cleancores().  Then the process returns to my override of the run() loop, which ends it via

        myMf->PostMessage( WM_QUIT );    // tell it to stop now
        return CWinApp::Run();

    Is there any other way to terminate the processes in code that will release all resources?  That might do the trick.


    void cleancores( void )
    {
        bool ret;
        if (pderiv) {
            ret = UnmapViewOfFile( pderiv );
            if (ret == 0)     AfxMessageBox( "Unmap failure" );    
        }
        if (pdelq) {
            ret = UnmapViewOfFile( pdelq );
            if (ret == 0)     AfxMessageBox( "Unmap failure" );    
        }
        if (pscdr) {
            ret = UnmapViewOfFile( pscdr );
            if (ret == 0)     AfxMessageBox( "Unmap failure" );    
        }

        if (hMapping) {
            ret = CloseHandle( hMapping );
            if (ret == 0)     AfxMessageBox( "CloseHandle failure" );    
        }
        if (hMapping2) {
            ret = CloseHandle( hMapping2 );
            if (ret == 0)     AfxMessageBox( "CloseHandle failure" );    
        }
        if (hMapping3) {
            ret = CloseHandle( hMapping3 );
            if (ret == 0)     AfxMessageBox( "CloseHandle failure" );    
        }

        pderiv = 0;
        pdelq = 0;
        pscdr = 0;
        hMapping = 0;
        hMapping2 = 0;
        hMapping3 = 0;
    }

    Thursday, August 6, 2020 12:41 PM
  • When each process is supposed to terminate, I unmap the files with cleancores().  Then the process returns to my override of the run() loop, which ends it via

        myMf->PostMessage( WM_QUIT );    // tell it to stop now
        return CWinApp::Run();

    Is there any other way to terminate the processes in code that will release all resources?  That might do the trick.


    void cleancores( void )
    {
        bool ret;
        if (pderiv) {
            ret = UnmapViewOfFile( pderiv );
            if (ret == 0)     AfxMessageBox( "Unmap failure" );    
        }
        if (pdelq) {
            ret = UnmapViewOfFile( pdelq );
            if (ret == 0)     AfxMessageBox( "Unmap failure" );    
        }
        if (pscdr) {
            ret = UnmapViewOfFile( pscdr );
            if (ret == 0)     AfxMessageBox( "Unmap failure" );    
        }

        if (hMapping) {
            ret = CloseHandle( hMapping );
            if (ret == 0)     AfxMessageBox( "CloseHandle failure" );    
        }
        if (hMapping2) {
            ret = CloseHandle( hMapping2 );
            if (ret == 0)     AfxMessageBox( "CloseHandle failure" );    
        }
        if (hMapping3) {
            ret = CloseHandle( hMapping3 );
            if (ret == 0)     AfxMessageBox( "CloseHandle failure" );    
        }

        pderiv = 0;
        pdelq = 0;
        pscdr = 0;
        hMapping = 0;
        hMapping2 = 0;
        hMapping3 = 0;
    }

    This sounds like a problem.

    First, it is highly unusual to override MFC's CWinApp(Ex)::Run function.  And if you want to cleanly end an MFC application it is common to send or post the WM_CLOSE message to the application's main frame window (this is what task manager does).

    The usual sequence for a windows application is that the WM_CLOSE message causes DestroyWindow to be called and the message handler for WM_DESTROY calls PostQuitMessage thus causing the message loop to terminate and the application to exit.

    Sunday, August 9, 2020 10:41 AM
  • Breaking news:  I rewrote the code so it didn't use memory mapping, but instead did IPC via some disk files.  The program runs correctly -- and it still crashes the same way!  So memory mapping is not the culprit. 

    I ran some tests, running the Fortran code for many cycles, and determined that the crash occurred at cycle 9 of a given subroutine.  If I break with the debugger at cycle 10, the old crash symptoms show up again and I have to reboot Windows.

    But if I break at cycle 9, continue, and then break at 10, there is no crash.  Just breaking and continuing at that cycle avoids the crash!

    Can anyone think of a reason for this weird behavior?  It looks like a Windows bug, perhaps something about paging.  Can anyone think of a solution?
    Sunday, August 9, 2020 1:37 PM
  • That stopping the code and then continuing seems to work suggests that you have some kind of race condition between processes.
    Sunday, August 9, 2020 5:54 PM
  • That is a cogent suggestion--but a data race would hang the program or produce wrong calculations, which is not happening.

    Tell me this: is there anything my program could do that would crash the Windows OS?  I thought it was pretty well protected.  If I knew what could be doing it, I would know where to look.

    Monday, August 10, 2020 12:30 PM
  • Since you are probably not a hacker, it is most likely that anything you do to crash Windows is the result of undefined behavior.  One possible cause is overrunning allocated memory.  Another is one process is interrupted in the middle of an update and the second process accesses the partially updated data.

    Can you add debugging printouts to the processes with a time stamp to to get an accurate chronological history.  That might help you debug.

    It's not a solution but a workaround might be to put a delay in the process to simulate the break you describe above.  Something like "if this is iteration 9 sleep for 2 seconds".

    Monday, August 10, 2020 10:33 PM
  • While the forum community tries to be helpful, at this point its still a guessing game.

    I assume that you are working on commercial software since earlier you mentioned non-disclosure agreements.

    That being the case, I'll again repeat my suggestion that you contact Microsoft for professional developer support.

    Tuesday, August 11, 2020 1:23 PM
  • The idea of putting a delay in the loop in question has merit, and I tried it.  A delay of five seconds prevented the crash in one case, but a repeat run crashed in spite of it.  And of course it slows down what should be a fast process, so in a way I'm glad it is not the answer.
    Tuesday, August 11, 2020 2:42 PM
  • I have sent notes to Microsoft, without results.  If you know a contact there who might be able to diagnose the issue, please send me the link.  It seems to be over the head of many folks besides me.

    Tuesday, August 11, 2020 2:44 PM
  • Take a look at the options here -

    https://support.microsoft.com/en-us/help/4341255

    Tuesday, August 11, 2020 2:52 PM
  • I appreciate the suggestion, but the link leads to a sales pitch, not a person.  I found a Contact Us link, and searched on "crashing".  The link has 2,360,000 entries.

    Do you know how I can contact a live person?

    Tuesday, August 11, 2020 3:15 PM
  • Did you choose the Microsoft Professional Support (pay-per-incident) link?
    Tuesday, August 11, 2020 3:31 PM
  • I'm not sure where to go.  Microsoft will support C++, but the code in question is mostly Fortran.  Is there a help desk for that language?
    Tuesday, August 11, 2020 5:27 PM
  • I note that the pay option no longer supports Visual Studio 2010.  So the next step is for me to upgrade.  Perhaps that is long overdue!
    Tuesday, August 11, 2020 7:54 PM
  • Visual Studio 2010 reached end of life and went out of support in July 2020.
    Wednesday, August 12, 2020 11:29 AM
  • Right you are.  The home office is planning to upgrade.  Then we'll see what happens.
    Wednesday, August 12, 2020 1:42 PM