Monday, September 14, 2009 11:16 AMIs there a way to learn on which NUMA node a memory block resides? I have an application where large blocks of memory that are externally allocated need to be processed by multiple threads in parallel. I would like something like:
__in LPVOID ptr,
__out PULONG NodeNumber
Wednesday, September 16, 2009 6:06 AMI might have missed something here, but I don't know of anything. I will definitely ask and see if I have missed something, but there doesn't appear to be an equivalent to VirtualAllocNumaEx which lets you query the NUMA node from an address.
Windows Server 2008 R2 did extend the NUMA API set and there is a good overview here: http://msdn.microsoft.com/en-us/library/aa363804(VS.85).aspx
Are you hitting measurable performance issues as a result of this and if I may ask what is the rough HW config you're looking at here.
Rick Molloy Parallel Computing Platform : http://blogs.msdn.com/nativeconcurrency http://parallelroads.com/blog
Monday, November 02, 2009 10:37 AMThanks a lot for your input and sorry for not responding sooner. Forgot to check on any replies while I was busy with some other things.
To be honest I am not sure at this point if I am hitting perfomance issues since I have no way of knowing if the data resides on the node processing it or not. I have a camera that spits out frames of data at a high rate (c.a. 500 MBytes/s). These frames need to be processed and I would like to do the processing on the node where the data resides. Since I have no control over the memory allocation for the camera buffers I would like to check on which node they reside.
Even if there is no API function that does this is there a way to determine it based on the pointer adress? Is the first half of the adress space allocated to Node 0 and the second half to Node 1 perhaps?
My hardware is the following:
- HP Proliant ML370 G6
- 2x Intel Xeon E5540
- 8GB memory
All comments are highly appreciated,
Tuesday, November 24, 2009 9:36 AMThe only method I have seen to determine this information is in the source code of The Numa Explorer, which can be found here: http://www.flounder.com/numaexplorer.htm by Joseph M. Newcomer. This code does illustrate the difficulty of getting this information very clearly.
Perhaps a simple way of getting this information could/should be a new API suggestion/request?
edit: quote from his webpage:
"These values are obtained by using QueryWorkingSetEx but instead of listing every page, it reduces the data to sets of continuous runs of pages which are in the same node. This is shown in the picture...."
- Marked As Answer by mattijsdegroot Friday, December 04, 2009 9:03 AM
Tuesday, November 24, 2009 10:30 AMThanks for the information. I'll have a look at that code.
Do you agree that it would be reasonable to expect such a function in the API? It would seem to me that having to deal with externally allocated memory is a fairly general problem in NUMA aware applications.
Tuesday, November 24, 2009 3:56 PMI, personally, can see no reason why such information should be withheld or not presented via the API if the information is already available to the Operating System itself (Full system topology info is a related case in point that IMO should always have been available since even before dual processor machines first emerged, Win7 at least addresses that lack now).
There may even be some undocumented API that publishes the "memory@node" location that the other APIs use, buried someplace that could be exposed as part of the NUMA API ?
Wednesday, November 25, 2009 7:19 AMAs you stated above, the information is indeed available. If I interpret the code for The Numa Explorer correctly it is possible to get the information through the process status API (PSAPI), but that is a very cumbersome process.
Where could I file such an API feature request?
Friday, December 04, 2009 9:00 AMIndeed QueryWorkingSetEx is the way to go for now.
I got the following response from microsoft after sending an API request:
QueryWorkingSetEx can be used for this. There is an example here:
Note that physical pages in a given virtual buffer are not necessarily allocated from the same NUMA node. In most cases, checking only the first page in the buffer should work, but there might be situations where the first page is allocated from node X and the rest of the pages are from node Y (even if all nodes in the system have plenty of available pages). This could happen for example if the contents of first page are initialized by one thread, and the rest of the pages are initialized from a different thread whose ideal processor is part of a different node.
If they don't control how the buffer is allocated and initialized it might make sense to check several pages at random and select the node that
appears most often.
The example at the above link is "Allocating Memory from a NUMA Node." See the DumpNumaNodeInfo function for the QueryWorkingSetEx call.
I will pass your API request on to the NUMA product team; they are currently in the planning stage for the next version of Windows. But in the
meantime, I hope QueryWorkingSetEx works for your application so you don't have to wait. :-)
- Proposed As Answer by Mohamed Ameen IbrahimMicrosoft Employee Friday, August 06, 2010 5:15 PM
Tuesday, March 02, 2010 2:45 PMAfter a long delay I revisited this problem and I have come up with the code listed below to directly determine the NUMA Node from a pointer. It was actually much easier than I thought. It seems to work correctly, but beware that the pointer needs to point to initialized memory. If you like this function or have suggestions to improve it I would love to see a reply.
#define _WIN32_WINNT 0x0600
int GetNumaNodeFromAdress (PVOID Buffer)
//PCHAR StartPtr = (PCHAR)(Buffer);
WsInfo.VirtualAddress = Buffer;
BOOL bResult = QueryWorkingSetEx(
PCHAR Address = (PCHAR)WsInfo.VirtualAddress;
BOOL IsValid = WsInfo.VirtualAttributes.Valid;
DWORD Node = WsInfo.VirtualAttributes.Node;