Answered by:
Error when rebalancing filesets: Unable to get local path to HpcTemp share

Question
-
Hello,I'm trying to execute the following query in an attempt to rebalance the filesets:
HpcLinqConfiguration config = new HpcLinqConfiguration(SampleConfiguration.HeadNode); using (HpcLinqContext context = new HpcLinqContext(config)) { int partitions = context.GetNodeCount() * 5; context.FromDsc<LineRecord>("FileSet") .HashPartition(r => r, partitions) .ToDsc("NewFileSet") .SubmitAndWait(context); }
I get the following error:ProcessPathHelper.BuildJobPath: Unable to get local path to HpcTemp share.[ExecutionHelper.InitializeForJobExecution] Exception: Path cannot be the empty string or all whitespace.at System.IO.Directory.CreateDirectory(String path, DirectorySecurity directorySecurity)at Microsoft.Hpc.Dryad.ExecutionHelper.InitializeForJobExecution(String resources)It seems that the temp share directory is set correctly:C:\Windows\system32>dsc node view hpcdevnode1Node hpcdevnode1:State = OKHpcTemp Local Path = C:\DSC\tempHpcData Local Path = C:\DSC\dataFree Space = 49690973082C:\Windows\system32>dsc node view hpcdevnode2Node hpcdevnode2:State = OKHpcTemp Local Path = C:\DSC\tempHpcData Local Path = C:\DSC\dataFree Space = 49690712113C:\Windows\system32>dsc node view hpcdevnode3Node hpcdevnode3:State = OKHpcTemp Local Path = C:\DSC\tempHpcData Local Path = C:\DSC\dataFree Space = 49689909297In fact I can see a folder created in the temp directory for the job as well as dll's, exe's and a DryadLinqProgram_0.xml file created within that folder. How can I troubleshoot this problem further?Thanks,Gil
- Edited by gilvalen Tuesday, November 1, 2011 8:02 PM
Tuesday, November 1, 2011 6:13 PM
Answers
-
Hi Gil,
As I imagine you've divined, this error is generated when one of the L2H tasks are started on a node that does not have the HpcTemp share. Two common situations may explain why you're seeing it:
1.) Running on a node that is not in DSC.
We don't filter to only DSC nodes by default when running the L2H query, so if you have compute nodes in your HPC cluster that are not added to DSC, this error could result. To verify this, you can go to the failed job in the HPC Job Manager and view task details, where it will list the failed task(s) and the allocated nodes for each task. If the failed task is allocated to an HPC node not in DSC, that's the reason.
To work around this, you can either add all the HPC nodes to DSC or specify a node group you would like to use on the HpcLinqConfiguration and add all the DSC nodes to that node group.
2.) DSC node doesn't have the shares.
This only happens if the shares were deleted (reimaging the node, manually running net share delete, etc). You can check this by running 'net share HpcTemp' on each of the nodes and verifying the shares exist. If they do not, please try to identify how they were deleted so that can be avoided in the future and recreate the shares.
If neither of these situtations are applicable, it would be good to make sure each node can access it's own share and report back.
Thanks,
Jeremy
- Marked as answer by gilvalen Tuesday, November 1, 2011 9:11 PM
Tuesday, November 1, 2011 8:42 PM
All replies
-
Hi Gil,
As I imagine you've divined, this error is generated when one of the L2H tasks are started on a node that does not have the HpcTemp share. Two common situations may explain why you're seeing it:
1.) Running on a node that is not in DSC.
We don't filter to only DSC nodes by default when running the L2H query, so if you have compute nodes in your HPC cluster that are not added to DSC, this error could result. To verify this, you can go to the failed job in the HPC Job Manager and view task details, where it will list the failed task(s) and the allocated nodes for each task. If the failed task is allocated to an HPC node not in DSC, that's the reason.
To work around this, you can either add all the HPC nodes to DSC or specify a node group you would like to use on the HpcLinqConfiguration and add all the DSC nodes to that node group.
2.) DSC node doesn't have the shares.
This only happens if the shares were deleted (reimaging the node, manually running net share delete, etc). You can check this by running 'net share HpcTemp' on each of the nodes and verifying the shares exist. If they do not, please try to identify how they were deleted so that can be avoided in the future and recreate the shares.
If neither of these situtations are applicable, it would be good to make sure each node can access it's own share and report back.
Thanks,
Jeremy
- Marked as answer by gilvalen Tuesday, November 1, 2011 9:11 PM
Tuesday, November 1, 2011 8:42 PM -
Hi Jeremy,
I understand now. Situation #1 applied. I used the the cluster manager->node management utility to view the groups and set the HpcLinqConfiguration.NodeGroup property to specify only the DSC nodes:
var config = new HpcLinqConfiguration(headNode) { NodeGroup = "DSCNodes" };
After executing again, the problem disappeared.
Thanks!
Gil
Tuesday, November 1, 2011 9:11 PM