How can I know the 4 threads are running on the four core with OpenMP?

已答覆 How can I know the 4 threads are running on the four core with OpenMP?

  • 2010年6月11日 下午 08:24
     
     

    Hi, I have four independent function f1, f2, f3, f4 and want to put them on each core to run.

    I test a hello.c as follows:
    #include <omp.h>

    int main()
    {
      printf("Hello from serial\n");
      printf("Thread number = %d\n", omp_get_thread_num());

    #pragma omp parallel
      {
        printf("Hello from parallel, Thread number = %d\n", omp_get_thread_num());
      }
    #pragma omp parallel
      {
        printf("Hello from parallel, Thread number = %d\n", omp_get_thread_num());
      }
      printf("Hello from serial again. \n");
      return 0;
    }
    The result shows there are four threads to print "Hello",
    However, how can I know the four thread are running on the four cores ?

    Thanks a lot

所有回覆

  • 2010年6月11日 下午 09:00
     
     

    I know there are some methods to distribute the task into the threads,

    but how do I know the threads are on different cores?

    For example, I can use "sections " and "section" to different tasks on different threads,

    but I do not find the method to distribute the tasks on the cores,

    Does it mean that the OS distributes the threads on the different cores automaticly?

    Thanks a lot.

  • 2010年6月11日 下午 09:36
     
     提議的解答

    The OpenMP runtime is going to create a series of threads (a team) to execute the work you wish to run in parallel (annotated by the #pragma omp ...).  The OpenMP runtime is responsible for determining how to put work as defined through those pragmas on the threads within the team.  Most of the time, the OpenMP runtime is not going to set any specific affinity for those threads and the operating system scheduler will figure out how to place those threads when they run on logical processors -- that's its job.  On an N core system, if there are N threads runnable at a given point in time, the OS is likely to have those distributed across the N cores.

    You can utilize tools like the Parallel Performance Analyzer in Visual Studio 2010 to see how the threads within an application are mapping to logical cores at a specific point in time (Analyze / Performance Wizard / Concurrency).  You can also quickly convince yourself they're running on different processors through something like adding a GetCurrentProcessorNumber(Ex) to your printf above.

    Please let me know if this doesn't answer your question.

    • 已提議為解答 Dana Groff 2010年6月14日 上午 02:56
    •  
  • 2010年6月12日 上午 04:32
     
     

    GetCurrentProcessorNumber(Ex) can only be used on windows7,
    and it can only get the information, not set the affinity.
    Although in most cases, it is better to let the system select an available processor
    I want to control it.

    Can I use "SetThreadAffinityMask(tHandle,0×00000001)" to the 1st processor,

    "SetThreadAffinityMask(tHandle,0×00000010)" to the 2nd processor? 

    Thanks a lot.

  • 2010年6月13日 下午 02:55
     
     已答覆

    You can do that.  Technically, you should fetch the system affinity mask first and work from what's available, because its possible that a system may not make all CPU cores/threads available to a process (e.g. under virtualization), although I've never seen that myself.  Also, you need to know what is a CPU vs Core vs "hyperthread".  For example, you might be running on a two-socket system using the Xeon 5500 series, so you could have two CPUs, four cores, each, two threads per core.  The only way to really know is to query the CPU topology.  Otherwise you might end up pegging your threads to two hyperthreads of the same core, when you really meant to use two separate cores.

    Given this complexity, why is it worthwhile to set the thread affinity yourself instead of letting the OS manage it?  

    One more thought:  Future server CPUs may become asymmetrical.  I've read articles talking about future CPUs that contain, say, a couple of "fat fast" cores (lots of cache, higher voltage, spec-ex, OOOEx, high clock speed) that the OS will prefer under single-threaded load, and a slew of "skinny" cores and threads that come into play under parallel load, maybe dialing back the clock speed of the fat cores when that happens to reduce power.  Can you do a better job than the OS under that situation?


    John Lilley CTO DataLever Corporation
    • 已標示為解答 Jerry FENG 2010年6月25日 下午 02:15
    •  
  • 2010年6月24日 下午 04:52
     
     

    Jerry,

    Are you satisfied with these answers?  If not, could you eleborate your questions so that we can answer it better?  If yes, please mark the thread as answered. 

    Thank you,

    Dana Groff, Parallel Computing PM