Locked hpc job failed

  • 2011年10月17日 10:53
     
     

    HI all

    I had created a cluster job by job managment consol in hpc cluster that this job include a executable mpi application file.

    I had done this steps:

    in ehe job management consol click on add new job

    then next and at the task page on the command,write:mpiexec.exe  myapp.exe

    at the worker directory,write:\\headnode\myapp   the location that my exe file there.

    and submit

    but the job failed....

    please help me...

全部回复

  • 2011年10月17日 22:43
     
     建议的答复

    Hello,

    There are several reasons could cause the MPI job fail, for example, wrong net mask, not enough resources, mpi servic edown, etc. Before figure out what's the root cause of MPI job failures, could you please post the full error message here? You can find the failed job ID and using command: task view [jobid].1 or you can browse the job management UI to find the details of the failed job.

    Thanks,

    James

    • 已建议为答案 Ade Miller 2011年11月1日 16:28
    •  
  • 2011年10月19日 19:43
     
     

    I agree with James. Some other general troubleshooting tips:

    Use the debugger (http://msdn.microsoft.com/en-us/library/ee945373.aspx)
    Turn on Auditing for Failures in Local Group Policy Manager
    Make sure you can access the share and execute it from each node. (\\headnode\myapp)


    --Patrick Gallucci