Unanswered Get-HpcNodeState unrecognized in HeadNode

  • Friday, February 17, 2012 6:02 PM
     
      Has Code

    I have deployed the Sample HPC Azure project to the Cloud. The deployment has 1 head node, 1 front end node and 98 compute nodes.

    Although everything looks just fine in Azure Management Portal, there are a few problems:

    1. On deployment, 17 nodes are unreachable.
    2. Running Cluster manager on the head node fails with:
      Failed to communicate with remote SDM store. Connection Failed. 
      No connection could be made because the target machine actively 
      refused it ip.of.my.srvr:9893
      
    3. No HpCNode cmdlet is available on the head node.
    4. Starting a Job on unreachable node stays queued saying the requested resources are not available.

    There is almost noway that I could manage the Nodes.

    What is it that I'm missing?

All Replies

  • Monday, February 20, 2012 5:48 AM
    Moderator
     
     

    Hi,

    I am trying to involve someone familiar with this topic to further look at this issue. There might be some time delay.

    Appreciate your patience.

    Best Regards,

    Ming Xu.


    Please mark the replies as answers if they help or unmark if not.
    If you have any feedback about my replies, please contact msdnmg@microsoft.com.
    Microsoft One Code Framework

  • Tuesday, February 21, 2012 6:59 AM
     
     

    Do you try sample at http://msdn.microsoft.com/en-us/library/hh560251(v=vs.85).aspx and install the SDK?

    Do you try to reduce the instance to 10 and try again?

  • Tuesday, February 21, 2012 3:30 PM
     
     

    I did install the sample. When the size is 10, I don't see any unreachable nodes. but when I go to a larger cluster (which I need, and I have 300 available nodes), I see about 10%-20% unreachable nodes.

    Anyhow, problems 2 and 3 still exist on a small HPC cluster.

  • Tuesday, February 21, 2012 3:39 PM
     
     

    Thank you very much. with this problem, I'm not using 10%-20% of my 300 nodes. At first (when I didn't have this many nodes), I assumed it's by design (to reserver a few nodes), but when I looked closer, I found out they are unreachable.

    What is interesting for me was the inability to use Any HpCNode cmdlets such as:

    • Restart-HpcNode, Remove-HpcNode
    • Get-HpcNode, Get-HpcNodeStateHistory,...
    • Set-HpcNode, Set-HpcNodeState

    Actually I'll go ahead and list all available *Hpc* cmdlets as a reply to the main question.

  • Tuesday, February 21, 2012 3:55 PM
     
     
    Here are the available cmdlets in the HeadNode:
    Add-HpcPool
    Add-HpcTask
    Copy-HpcJobTemplate
    Export-HpcJob
    Export-HpcJobTemplate
    Export-HpcSoaSessionTrace
    Export-HpcTask
    Get-HpcClusterOverview
    Get-HpcClusterProperty
    Get-HpcJob
    Get-HpcJobCredential
    Get-HpcJobTemplate
    Get-HpcJobTemplateAcl
    Get-HpcPool
    Get-HpcTask
    Import-HpcJobTemplate
    New-HpcJob
    Remove-HpcJobCredential
    Remove-HpcJobTemplate
    Remove-HpcPool
    Remove-HpcSoaCredential
    Remove-HpcSoaSessionTrace
    Set-HpcClusterProperty
    Set-HpcJob
    Set-HpcJobCredential
    Set-HpcJobTemplateAcl
    Set-HpcPool
    Set-HpcSoaCredential
    Set-HpcTask
    Stop-HpcJob
    Stop-HpcTask
    Submit-HpcJob

  • Friday, March 09, 2012 5:24 AM
     
     

    I found you raise the SR and  confirmed all nodes are working now but  second question is unanswered, why HPC cmdlets are not available on the headnode.

    • Restart-HpcNode, Remove-HpcNode
    • Get-HpcNode, Get-HpcNodeStateHistory,...
    • Set-HpcNode, Set-HpcNodeState

    What is the detail error information when you run the Restart-HpcNode?

    Do you have right permission for the current user context?

    If you run the command from remote client, please check document below

    http://blogs.technet.com/b/windowshpc/archive/2009/01/07/a-powershell-problem.aspx