hpc job failed
-
lundi 17 octobre 2011 10:53
HI all
I had created a cluster job by job managment consol in hpc cluster that this job include a executable mpi application file.
I had done this steps:
in ehe job management consol click on add new job
then next and at the task page on the command,write:mpiexec.exe myapp.exe
at the worker directory,write:\\headnode\myapp the location that my exe file there.
and submit
but the job failed....
please help me...
Toutes les réponses
-
lundi 17 octobre 2011 22:43
Hello,
There are several reasons could cause the MPI job fail, for example, wrong net mask, not enough resources, mpi servic edown, etc. Before figure out what's the root cause of MPI job failures, could you please post the full error message here? You can find the failed job ID and using command: task view [jobid].1 or you can browse the job management UI to find the details of the failed job.
Thanks,
James
- Proposé comme réponse Ade Miller mardi 1 novembre 2011 16:28
-
mercredi 19 octobre 2011 19:43
I agree with James. Some other general troubleshooting tips:
Use the debugger (http://msdn.microsoft.com/en-us/library/ee945373.aspx)
Turn on Auditing for Failures in Local Group Policy Manager
Make sure you can access the share and execute it from each node. (\\headnode\myapp)
--Patrick Gallucci- Modifié Patrick Gallucci mercredi 19 octobre 2011 19:55

