hpc job failed
-
2011年10月17日 10:53
HI all
I had created a cluster job by job managment consol in hpc cluster that this job include a executable mpi application file.
I had done this steps:
in ehe job management consol click on add new job
then next and at the task page on the command,write:mpiexec.exe myapp.exe
at the worker directory,write:\\headnode\myapp the location that my exe file there.
and submit
but the job failed....
please help me...
全部回复
-
2011年10月17日 22:43
Hello,
There are several reasons could cause the MPI job fail, for example, wrong net mask, not enough resources, mpi servic edown, etc. Before figure out what's the root cause of MPI job failures, could you please post the full error message here? You can find the failed job ID and using command: task view [jobid].1 or you can browse the job management UI to find the details of the failed job.
Thanks,
James
- 已建议为答案 Ade Miller 2011年11月1日 16:28
-
2011年10月19日 19:43
I agree with James. Some other general troubleshooting tips:
Use the debugger (http://msdn.microsoft.com/en-us/library/ee945373.aspx)
Turn on Auditing for Failures in Local Group Policy Manager
Make sure you can access the share and execute it from each node. (\\headnode\myapp)
--Patrick Gallucci- 已编辑 Patrick Gallucci 2011年10月19日 19:55

