Locked Dryad question:can not find the network path

  • Monday, May 16, 2011 1:52 AM
     
     

    I installed HPC Pack 2008 R2 SP2 Beta on head node and compute node.The system of head node and compute node is Windows Server 2008 R2.The informations about DSC are as follows:

    C:\Users\Administrator>dsc node list
    COMPUT
    HEAD

    The COMPUTE is a compute node.The HEAD is the head node.

    C:\Users\Administrator>dsc node view compute
    Node compute:
        State = ReadWrite
        Storage UNC Path = \\COMPUTE\HpcData
        Storage Local Path = c:\Dryad\HpcData
        Allocated Size = 0
        Free Space = 5787881266

    C:\Users\Administrator>dsc node view head
    Node head:
        State = ReadWrite
        Storage UNC Path = \\HEAD\HpcData
        Storage Local Path = c:\Dryad\HpcData
        Allocated Size = 0
        Free Space = 28830396416

    I created a fileset named MyFileSet2 on DSC.The information about MyFileSet2 is as follows:

    C:\Users\Administrator>dsc fileset view MyFileSet2
    FileSet MyFileSet2:
        Sealed = True
        File Count = 1
        Total File Size = 103
        Creation Time = 2011/5/15 23:01:49
        Last Used Time = 2011/5/15 23:01:49
        Lease Time = None
        Replication Factor = 1
        Permissions:
            BUILTIN\Administrators          ReadOrModify
            BUILTIN\Power Users             Read
            COMPUTE\Administrator           ReadOrModify    Owner

    C:\Users\Administrator>dsc fileset view MyFileSet2 /files
    FileSet MyFileSet2:
        Sealed = True
        File Count = 1
        Total File Size = 103
        Creation Time = 2011/5/15 23:01:49
        Last Used Time = 2011/5/15 23:01:49
        Lease Time = None
        Replication Factor = 1
        Permissions:
            BUILTIN\Administrators          ReadOrModify
            BUILTIN\Power Users             Read
            COMPUTE\Administrator           ReadOrModify    Owner
        Files:
            0000000000000002.data

    When i submit a job,the error is "can not find the network path".My program is:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using Microsoft.Hpc.Linq;

    namespace MyProject
    {
        class Program
        {
            static void Main(string[] args)
            {
                var config = new HpcLinqConfiguration("head");
                var context = new HpcLinqContext(config);

           var lengths = context.FromDsc<LineRecord>("MyFileSet2")
                                .Select(r => r.Line.Length);
           Console.WriteLine("The maximum line length is {0}", lengths.Max());

            }
        }
    }

    The details about the error is:

    Unhandle Microsoft.Hpc.Linq.HpcLinqException
      Message=Error submitting job to head. Refer to inner exception for more detail.
      Source=Microsoft.Hpc.Linq
      ErrorCode=50331656
      StackTrace:
           at Microsoft.Hpc.Linq.HpcJobSubmission.SubmitJob()
           at Microsoft.Hpc.Linq.JobExecutor.ExecuteAsync(String dryadProgram)
           at Microsoft.Hpc.Linq.HpcLinqQueryGen.InvokeDryad()
           at Microsoft.Hpc.Linq.HpcLinqQuery`1.ToTable(HpcLinqContext context, String targetUri, Boolean

    isTempOutput)
           at Microsoft.Hpc.Linq.HpcLinqQuery`1.GetEnumerator()
           at System.Linq.Enumerable.Single[TSource](IEnumerable`1 source)
           at Microsoft.Hpc.Linq.DryadLinqProvider.Execute[TResult](Expression expression)
           at System.Linq.Queryable.Max[TSource](IQueryable`1 source)
           at MyProject.Program.Main(String[] args) location D:\MyProject\MyProject\Program.cs:line 20
           at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args)
           at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
           at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback

    callback, Object state)
           at System.Threading.ThreadHelper.ThreadStart()
      InnerException: System.ApplicationException
           Message=DryadJobSumission.SubmitJob: Error trying to copy files to staging directory. 'can not find

    the network path。
    '
           Source=Microsoft.Hpc.Query.JobSubmission
           StackTrace:
                at Microsoft.Hpc.Dryad.DryadJobSubmission.CopyJobFilesToStagingCluster(String

    DryadJobSpecificStagingDir)
                at Microsoft.Hpc.Dryad.DryadJobSubmission.CopyJobFilesToStaging(String

    DryadJobSpecificStagingDir)
                at Microsoft.Hpc.Dryad.DryadJobSubmission.SubmitJob()
                at Microsoft.Hpc.Linq.HpcJobSubmission.SubmitJob()
           InnerException: System.IO.IOException
                Message=not find the network path。

                Source=mscorlib
                StackTrace:
                     at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
                     at System.IO.Directory.InternalCreateDirectory(String fullPath, String path,

    DirectorySecurity dirSecurity)
                     at System.IO.Directory.CreateDirectory(String path, DirectorySecurity directorySecurity)
                     at Microsoft.Hpc.Dryad.DryadJobSubmission.CopyJobFilesToStagingCluster(String

    DryadJobSpecificStagingDir)
                InnerException:

    What can i do with it?Looking forward for your help!Thanks.

    xfengm

All Replies

  • Monday, May 16, 2011 3:39 PM
     
     

    I think you have the same issue as  hmilyrr

    You need to make sure that the account being used to run the Dryad job is a member of the HPCUsers group or an administrator on the cluster.  See here for details as to how to do this http://technet.microsoft.com/en-us/library/gg250674(WS.10).aspx.

    Note.a) The documentation for beta was missing this step. I will be getting it added for RTM (my apologies). For the beta it was possible to use DSC without being a member of the HPCUsers group, this has also been fixed for RTM.

    Ade


    Ade
  • Tuesday, May 17, 2011 7:35 AM
     
     

    Hi,

    First,using the ADDS,i created a new account,and added the account to HPC administrator.Then,on one of the compute nodes,i logged in using the new account,and submit a job.I putted the job project at the share directory.There,the error occured.The error is the same with the previous.

    How can i deal with it?

    Looking forward your help.Thanks.

    xfengm

  • Tuesday, May 17, 2011 10:01 PM
     
     

    I'm not sure what you are doing here. "I putted the job project at the share directory". Please clarify. It seems like you have some sort of permissions or share issue.

    The Linq query copies data to a share on the head node called \XC. Does this exist on your head node? Can you access it from either your client or your compute node?

    Ade


    Ade
  • Wednesday, May 18, 2011 1:44 AM
     
     

    First  of all,thanks for your advice.

    When configed the cluster,I runed the command:

    DSC NODE ADD compute /TEMPPATH:c:\Dryad\HpcTemp /DATAPATH:c:\Dryad\HpcData /SERVICE:head

    on the compute node.And this command on head node,

    DSC NODE ADD head /TEMPPATH:c:\Dryad\HpcTemp /DATAPATH:c:\Dryad\HpcData /SERVICE:head

    Do i need to assign the \XC directory on head node?and how?

    Both of the head node and compute node,the user account name is "Administrator".When i checked the HPC Cluster Manager ,Configuration,Navigation Pane,Deployment To-do List,Add or remove users,

    i can see that :    username    wyf\administrator ,  rule   administrator

    Does that mean the user account which i used to submit a job  is the member of HPCUsers group?

    xfengm

  • Wednesday, May 18, 2011 1:53 PM
     
     

    I'm not asking you to create the network path I'm asking if it already exists.

    \\head\xc\

    If it does the account you are running the Dryad job under have permissions to read/write this directory share?

     


    Ade
  • Monday, May 23, 2011 1:15 AM
     
     
    I'm running into the same issue...I'm running VS from the HN and i'm domain admin, so it's not a permission issue, I'll try to see what's going on...
  • Tuesday, May 24, 2011 9:52 PM
     
     

    Hi,

     

    I'm keen to help with resolving this. Please let me know if you find out more.

     

    Thanks

     

    Ade


    Ade
  • Saturday, June 04, 2011 5:59 PM
     
     Proposed Answer

    Is this still an issue? If it is I'd like to know more. If not then I'd be interested to know how you resolved this.

    Thanks,

    Ade


    Ade
    • Proposed As Answer by Ade Miller Saturday, June 04, 2011 5:59 PM
    •  
  • Monday, June 06, 2011 4:58 AM
     
     

    Hi Ade,

    Thanks for catching up on this. This is still an issue, Kevin is helping me out to see if we can figure out why the Dryad program is not working correctly:

    https://connect.microsoft.com/HPC/feedback/details/672612/duppic2-will-not-find-the-network-share-dryad-programming#tabs

  • Thursday, June 09, 2011 5:29 AM
     
      Has Code

    Hi,

    Are you using a client machine outside the cluster Domain to submit jobs?

    We faced the same issue. What was happening was, when we use the API from a client machine outside the cluster AD, it is not able to resolve HEADNODE to the correct IP. For example, in the following code:

    FileInfo info = new FileInfo(FileName);
    DfsFile Df = TestFileSet.AddNewFile((ulong)info.Length);
    File.Copy(FileName, Df.WritePath);
    TestFileSet.Seal();
    

    Df.WritePath would get resolved to //HEADNODE/DATAPATH during run time and client had no way of resolving that to correct IP.

    This was giving the network path not found error.

    This following resolved it:

    Make an entry for Headnode in your etc/hosts file on the client machine. - C:\Windows\System32\drivers\etc\hosts

    For example, if your headnode IP is xx.xx.xx.xx and your headnode alias are headnode and headnode.domain.com, add the following line:

    xx.xx.xx.xx   headnode   headnode.domain.com

    After doing this, run the following command from command line:

    nbtstat –R
  • Saturday, June 11, 2011 8:11 PM
     
     

    Thanks for the feedback with this, it definitely sounds like a great explanation to the problem.

    My nodes are able to resolve the head node name, but I went ahead and modified the hosts file without any luck...will keep trying.

  • Saturday, June 11, 2011 8:22 PM
     
     

    My exception reads:

    "{"DryadJobSumission.SubmitJob: Error trying to copy files to staging directory. 'The network name cannot be found.\r\n'"}"

    I ran Process Explorer and found the following when DupBin2 is running:

    Who sets \\[HEAD-NODE]\XC\staging as the staging directory for Dryad? 


  • Saturday, June 11, 2011 8:25 PM
     
     Proposed Answer
    Update: Created the XC share, gave write permissions to everyone and now it works, tanks somewhere else, but the job is submitted.
    • Proposed As Answer by scorpiotek Saturday, June 11, 2011 8:25 PM
    •  
  • Sunday, June 12, 2011 4:52 PM
     
     

    "Who sets \\[HEAD-NODE]\XC\staging as the staging directory for Dryad? "

    This gets setup during installation. I suspect that the issue is related to permissions across the board. Shares HpcData and HpcTemp are also created on the compute nodes and the client reads and writes to those also.

    I've also updated the guideance for the development install doc to suggest using a domain joined cluster. This is actually the recommendation for setting up an HPC cluster all up.

    Thanks,

    Ade


    Ade
  • Sunday, June 12, 2011 5:03 PM
     
     

    Thanks Ade.  Just to be clear, everything I am running right now is joined to a domain.  Everything installed and configured, has been done with a domain admin account.

    I also had created the HpcDate and HPCTemp directories on every node using dsc node add...