none
shared memory with shared events RRS feed

  • Question

  • basically now i have a fully working shared memory driver that maps memory to user space . i have searched a lot about this but couldn't find an actual code example so i thought i would ask here because its the perfect place to in . 

    so what i basically want to do is this . i want to communicate between my user-mode and driver with shared events so i can send and receive data. for example i want to send a struct from my user-mode to my kernel mode its a read memory struct via ReadFile,WriteFile and events but i don't want to use any IOCTL code. is that even possible because i haven't found any example by doing that , and if you didn't get what i want to do i basically want to do something similar to (kernelbhop) but with only events . i would like to see code examples or functions that i can use because that's the only way i can understand how things works.

    Saturday, March 9, 2019 11:49 PM

All replies

  • I've done this a few times for clients, and it is quite complicated.

    You will communicate using your own command requests, analogous to IRPs. You will need at least 3 queues (doubly-linked lists), a free queue, a submission queue (app->kernel-mode), and a response queue (kernel-mode->app, and optionally another set of queues if you want to originate requests from kernel-mode->app. Each queue will have an event that is signaled when a request is placed on the queue. A kernel-mode thread would wait on the event indicating that a request was placed on the submission queue, process the request, and then put the request (with whatever data and completion status are appropriate) on the response queue and signal the "response ready" event that a user-mode thread is waiting on.

    I should point out that this can lead to some serious security vulnerabilities if you aren't very careful.

    As far as I know, there aren't any examples of this.

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Sunday, March 10, 2019 1:35 AM
    Moderator
  • I've done this but I used IOCTL's with METHOD_NEITHER.   A 4-core system can do over 1,000,000 IOCTLS per second if you design things right.  Basically you use the 4 IOCTL parameters directly, i.e. pass the offset of the memory and the length you wish to read or write.

    As Brian points out, there can be serious security holes with shared memory, but you can do some incredible performance.   One effort I was one of two developers on we got to 750,000 network messages a second when the hardware locked up (we proved we could get to 1,000,000 if they fixed the hardware).


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Sunday, March 10, 2019 1:47 AM
  • I've done this a few times for clients, and it is quite complicated.

    You will communicate using your own command requests, analogous to IRPs. You will need at least 3 queues (doubly-linked lists), a free queue, a submission queue (app->kernel-mode), and a response queue (kernel-mode->app, and optionally another set of queues if you want to originate requests from kernel-mode->app. Each queue will have an event that is signaled when a request is placed on the queue. A kernel-mode thread would wait on the event indicating that a request was placed on the submission queue, process the request, and then put the request (with whatever data and completion status are appropriate) on the response queue and signal the "response ready" event that a user-mode thread is waiting on.

    I should point out that this can lead to some serious security vulnerabilities if you aren't very careful.

    As far as I know, there aren't any examples of this.

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    i thought i could do something like , open the mapped section . then use Write File to send a string like "read_memory" then in MJ_Write function it will check the string if its == to "read_memory" then it will call my read memory function. i think using a mutex to wait for all of this processing stuff is much  simpler than with events . (correct me if am wrong) but tbh i really want to do my stuff with events.
    Sunday, March 10, 2019 11:22 AM
  • I've done this but I used IOCTL's with METHOD_NEITHER.   A 4-core system can do over 1,000,000 IOCTLS per second if you design things right.  Basically you use the 4 IOCTL parameters directly, i.e. pass the offset of the memory and the length you wish to read or write.

    As Brian points out, there can be serious security holes with shared memory, but you can do some incredible performance.   One effort I was one of two developers on we got to 750,000 network messages a second when the hardware locked up (we proved we could get to 1,000,000 if they fixed the hardware).


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    well i really don't want to use IOCTL code , because i have seen so many examples about it and i wanted to create something new . problem is i can't find any code examples .
    Sunday, March 10, 2019 11:23 AM
  • The only case I encountered with events, was such a mess that I had been brought in to help with it.  I ripped it all including the shared memory and still achieved the performance the device required. 

    This is kernel code, you shouldn't be worry about what is "new" your primary goal should be something that works well.


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Sunday, March 10, 2019 11:52 AM
  • The only case I encountered with events, was such a mess that I had been brought in to help with it.  I ripped it all including the shared memory and still achieved the performance the device required. 

    This is kernel code, you shouldn't be worry about what is "new" your primary goal should be something that works well.


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    thanks for advice however , am still searching for a code that does that but i couldn't find any . like why no-one has ever released it yet i have seen shared events in osr forums but when i downloaded the source i then saw that they are using DeviceIoControl. can't anything Replaces DeviceIoControl so i can use it without it ?

    Sunday, March 10, 2019 12:45 PM
  • Why are you hung up on replacing DeviceIoControl?   It is fast, and allows data to be passed as part of the event.  


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Sunday, March 10, 2019 1:28 PM
  • Why are you hung up on replacing DeviceIoControl?   It is fast, and allows data to be passed as part of the event.  


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    yeah i know that , but am trying to do everything in shared memory without even touching IOCTL code.
    Sunday, March 10, 2019 2:39 PM
  • In that case I can't help you.  Bottom line is you are trading speed and reliability for doing some "different".   That may be fine in art, but when a single mistake in a kernel driver will crash the system, that is a pretty stupid idea.   Part of the reason you have not found an example, is that most of the code is either buggy, or works for a specific design, but if someone else picks up the sample the odds are they are not going to understand the subtly of the code and fragility of the algorithm.


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Sunday, March 10, 2019 2:52 PM
  • In that case I can't help you.  Bottom line is you are trading speed and reliability for doing some "different".   That may be fine in art, but when a single mistake in a kernel driver will crash the system, that is a pretty stupid idea.   Part of the reason you have not found an example, is that most of the code is either buggy, or works for a specific design, but if someone else picks up the sample the odds are they are not going to understand the subtly of the code and fragility of the algorithm.


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    yeah i know its going to be messy and un-stable but still i will give it a try , thanks for your help :) btw i have found this https://github.com/mq1n/EasyRing0 , its sends a string and compare it in Mj_Write Function and if its the exact same string as the one in the user mode buffer it calls the readsharedmemory function . so i guess i can use that to communicate with my driver to call readshared memory to read the passed memory from my user mode then do what ever i want to do (call another function for example) then i can add a mutex object to wait between each operation so they don't overlap each other . (would that work i mean for performance )
    Sunday, March 10, 2019 3:00 PM
  • So you are going to use Write which has a significant higher overhead than IOCTL (since it is easy to do in FastIo) to manage shared memory communication which opens a big security hole, so is only used by application / driver pairs that really need very fast communication.   Or in other words you are throwing away the reasons for shared memory by using a slow I/O channel to manage it?


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Sunday, March 10, 2019 3:07 PM
  • So you are going to use Write which has a significant higher overhead than IOCTL (since it is easy to do in FastIo) to manage shared memory communication which opens a big security hole, so is only used by application / driver pairs that really need very fast communication.   Or in other words you are throwing away the reasons for shared memory by using a slow I/O channel to manage it?


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    well yeah , am doing it right now i hope i get it to work. and i will also post code here if i encountered any problems with it , and am sure that i will :)
    Sunday, March 10, 2019 5:12 PM
  • @Brian Catlin @Don Burn [Windrvr] so this is what i have done so far . its just a messed up code but am having some problems with it .

    this is what i am doing in my user mode app : 

    typedef struct KM_READ_REQUEST
    {
    	ULONG ProcessId;
    
    	UINT_PTR Address;
    	UINT_PTR Size;
    	void* Output;
    
    } KM_READ_REQUEST, *PKM_READ_REQUEST;
    
    
    template <typename type>
    	type RPM(UINT_PTR ReadAddress)
    	{
    		if (hDriver == INVALID_HANDLE_VALUE) {
    			return {};
    		}
    
    		DWORD64 Bytes;
    		KM_READ_REQUEST ReadRequest{};
    
    		type response{};
    
    		ReadRequest.ProcessId = PID;
    		ReadRequest.Address = ReadAddress;
    		ReadRequest.Size = sizeof(type);
    		ReadRequest.Output = &response;
    
    		// i need to return response;
    
                  hMapFile = OpenFileMappingA(FILE_MAP_WRITE, FALSE, "Global\\SharedMem");
    	if (!hMapFile || hMapFile == INVALID_HANDLE_VALUE)
    	{
    		printf("OpenFileMappingA(write) fail! Error: %u\n", GetLastError());
    		return 0;
    	}
    
            // i need to send ReadRequest to my mapped section aka to kernel mode
    	pBuf = (ReadRequest)MapViewOfFile(hMapFile, FILE_MAP_WRITE, 0, 0, 4096);
    	if (!pBuf)
    	{
    		printf("OpenFileMappingA(write) fail! Error: %u\n", GetLastError());
    		return 0;
    	}
    
    // copied data to the mapped section
             memcpy(pBuf,&ReadRequest,sizeof(ReadRequest));
    
          // now i need to trigger the kernel xD
    
            auto szMessage		= std::string("");
    	auto dwWriteCount	= 0UL;
    
            szMessage = "read_shared_memory";
    
    	dwWriteCount = 0UL;
    	if (WriteFile(hDriver, szMessage.c_str(), szMessage.size() + 1, &dwWriteCount, NULL) == FALSE)
    	{
    		printf("WriteFile(read) fail! Error: %u\n", GetLastError());
    		return false;
    	}
    
    
           // now i call readsharedmemory from my kernel to read the shared memory section and to call my read memory function.
    // then i read the output like this
    
    auto hMapFile = OpenFileMappingA(FILE_MAP_READ, FALSE, "Global\\SharedMemoryTest");
    	if (!hMapFile || hMapFile == INVALID_HANDLE_VALUE)
    	{
    		printf("OpenFileMappingA(read) fail! Error: %u\n", GetLastError());
    		return 0;
    	}
    
    // idk how to read an unkown value () <- i need to cast it or smth. trying to read [ Readoutput from kernel ]
    
    	auto pBuf = ()MapViewOfFile(hMapFile, FILE_MAP_READ, 0, 0, 4096);
    	if (!pBuf)
    	{
    		printf("OpenFileMappingA(read) fail! Error: %u\n", GetLastError());
    		return 0;
    	}
    
    // now i can read that  pBuf or assign it to response; but i guess i need a mutex because this whole thing is missed up and if 
    // i tried to read it would execute so fast and i would either crash or get a random value
    
    	}

    and this is what i am doing in my kernel driver.

    KGUARDED_MUTEX g_IrpReadMutex;
    PVOID SharedSection = NULL;
    HANDLE Sectionhandle;
    
    VOID ReadSharedMemory()
    {
    	if (Sectionhandle)
    		return;
    
    	if (g_pSharedSection)
    		ZwUnmapViewOfSection(NtCurrentProcess(), SharedSection);
    
    	SIZE_T ulViewSize = 1024 * 10;
    	NTSTATUS ntStatus = ZwMapViewOfSection(g_hSection, NtCurrentProcess(), &SharedSection, 0, ulViewSize, NULL, &ulViewSize, ViewShare, 0, PAGE_READWRITE | PAGE_NOCACHE);
    	if (ntStatus != STATUS_SUCCESS)
    	{
    		DbgPrint("ZwMapViewOfSection fail! Status: %p\n", ntStatus);
    		ZwClose(Sectionhandle);
    		return;
    	}
    	DbgPrint("ZwMapViewOfSection completed!\n");
    
    	DbgPrint("Shared memory read data: %s\n", SharedSection);
    }
    
    
    typedef struct KM_READ_REQUEST
    {
    	ULONG ProcessId;
    
    	UINT_PTR Address;
    	UINT_PTR Size;
    	void* Output;
    
    } KM_READ_REQUEST, *PKM_READ_REQUEST;
    
    
    
    NTSTATUS ReadKernelMemory(PEPROCESS Process, PVOID SourceAddress, PVOID TargetAddress, SIZE_T Size)
    {
    	PSIZE_T Bytes;
    	if (NT_SUCCESS(MmCopyVirtualMemory(Process, SourceAddress, PsGetCurrentProcess(),
    		TargetAddress, Size, KernelMode, &Bytes)))
    		return STATUS_SUCCESS;
    	else
    		return STATUS_ACCESS_DENIED;
    }
    
    
    
    NTSTATUS OnIRPWrite(PDEVICE_OBJECT pDriverObject, PIRP pIrp)
    {
    	UNREFERENCED_PARAMETER(pDriverObject);
    
    	char szBuffer[255] = { 0 };
    	strcpy(szBuffer, pIrp->AssociatedIrp.SystemBuffer);
    	DbgPrint("User message received: %s(%u)", szBuffer, strlen(szBuffer));
    
    	if (strcmp(szBuffer, "read_shared_memory"))
    	{
                    KeAcquireGuardedMutex (&g_IrpReadMutex);
    		ReadSharedMemory(); // reads shared memory the one i have copied before using memcpy.
              
    		PKM_READ_REQUEST ReadInput = (PKM_READ_REQUEST)SharedSection;
    		void* ReadOutput = ReadInput->Output;
    
    		PEPROCESS Process;
    		// Get our process
    		if (NT_SUCCESS(PsLookupProcessByProcessId(ReadInput->ProcessId, &Process))) {
    			Status = ReadKernelMemory(Process, ReadInput->Address, ReadOutput, ReadInput->Size);
    		}
    		else {
    			Status = STATUS_ACCESS_DENIED;
    			ObDereferenceObject(Process);
    			return Status;
    		}
    
    		//DbgPrintEx(0, 0, "Read Params:  %lu, %#010x \n", ReadInput->ProcessId, ReadInput->Address);
    
    
                    // clears sharedSection var so we can use it again
                    RtlZeroMemory(SharedSection,sizeof(SharedSection));
    
                    // copies the ReadOutput value to our mapped section 
                   memcpy(SharedSection,&ReadOutput,sizeof(ReadOutput));
    
                    KeReleaseGuardedMutex (&g_IrpReadMutex);
    	}
    
    	pIrp->IoStatus.Status = STATUS_SUCCESS;
    	pIrp->IoStatus.Information = strlen(szBuffer);
    	IoCompleteRequest(pIrp, IO_NO_INCREMENT);
    	return STATUS_SUCCESS;
    }
    
    NTSTATUS OnMajorFunctionCall(PDEVICE_OBJECT pDriverObject, PIRP pIrp)
    {
    	UNREFERENCED_PARAMETER(pDriverObject);
    
    	PIO_STACK_LOCATION pStack = IoGetCurrentIrpStackLocation(pIrp);
    	switch (pStack->MajorFunction)
    	{
    		case IRP_MJ_WRITE:
    			OnIRPWrite(pDriverObject, pIrp);
    			break;
    
    		default:
    			pIrp->IoStatus.Status = STATUS_SUCCESS;
    			IoCompleteRequest(pIrp, IO_NO_INCREMENT);
    	}
    	return STATUS_SUCCESS;
    }
    
    
    
    
    // in our driver entry 
    KeInitializeGuardedMutex(&g_IrpReadMutex); 
    
    

    of course am mapping memory and stuff but this is just what i want to achieve , but failing with 3 things 

    1 - its really bad code , and it will give bsod 100% because i don't wait for anything i just try to read what is in the section . (i guess a mutex is needed here)

    2 - i can't send ReadOutput from my kernel driver to my user mode or i can't read it because the value of it will change so idk how to get it .

    3 - it will keep creating a mapped section everytime i call RPM function (read memory) 

    so idk any suggestions guys is much appreciated. i want to fix my problems with this driver :)

    Sunday, March 10, 2019 7:24 PM
  • Why are you copying the data? That makes no sense. Instead, you should be creating a system thread and attach it to the process. That gives the thread access to the process's address space. Events, mutexes, and anything else that a thread can wait on are called Dispatcher Objects, and they are part of the scheduler. If your goal is speed, then you don't want to be triggering the scheduler for every message. If you use ring buffers or queues, you can drastically reduce the number of calls into the scheduler, which takes time. Also, in your app, if you are calling system services, such as SetEvent, or WaitForSingleObject, for every message then you are making a user->kernel transition (very expensive) and a scheduler transition (expensive), in which case you might as well just call DeviceIoControl

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Sunday, March 10, 2019 8:22 PM
    Moderator
  • Why are you copying the data? That makes no sense. Instead, you should be creating a system thread and attach it to the process. That gives the thread access to the process's address space. Events, mutexes, and anything else that a thread can wait on are called Dispatcher Objects, and they are part of the scheduler. If your goal is speed, then you don't want to be triggering the scheduler for every message. If you use ring buffers or queues, you can drastically reduce the number of calls into the scheduler, which takes time. Also, in your app, if you are calling system services, such as SetEvent, or WaitForSingleObject, for every message then you are making a user->kernel transition (very expensive) and a scheduler transition (expensive), in which case you might as well just call DeviceIoControl

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    thank you for that answer , i have asked a couple of ppl from different kind of forums and they have told me to use memcpy to copy data between driver and user mode . (my bad i guess)  , so i guess i should start again because that code is terrible . but i got a few questions 

    1 - with a system thread i can send and receive data from my driver to my user mode. like without even casting stuff from KM to UM. 

    2 - could you show me an example using threads like you have just said , (i will be searching for some while am waiting for your answer)

    and i really appreciate you help. for helping me fix my other problem and now helping me fix this one .

    Sunday, March 10, 2019 9:08 PM
  • 1. Yes. Kernel-mode threads have access to the user-mode address space for the current process.

    2. I'm not aware of any public code

    You really should read the Windows Internals books cover to cover before starting on a project like this

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Sunday, March 10, 2019 9:15 PM
    Moderator
  • 1. Yes. Kernel-mode threads have access to the user-mode address space for the current process.

    2. I'm not aware of any public code

    You really should read the Windows Internals books cover to cover before starting on a project like this

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    thank you , i will read it soon cheers.
    Sunday, March 10, 2019 9:38 PM