locked
Resize Timeout RRS feed

  • Question

  • Hi,

    I am having trouble resizing the pool of my batch account. I use the "User subscription" allocation mode because I need the nodes to be created in a network that I can manage (to grant firewall access for instance).

    Whenever a resize atempt is done, it fails telling me that there was a resize timeout. This timeout is set to 10 minutes (default value) and seems to be enough to get a new node. I suspect there is something else preventing this pool to allocate a new node but I can't find what is failing.

    How can I get more relevant information (the reason why it is not allocating rather than the fact that it did not happen).

    I am trying to get nodes size:standard_d2_v2 with OS set to  MicrosoftWindowsServer WindowsServer 2016-Datacenter (latest)

    I have been working with "batch service" allocation mode before and everything was working fine. 

    Thanks

    Wednesday, June 14, 2017 2:23 PM

Answers

  • Finally I got the explanation from Microsoft support team:

    Hello Bruno, 
    
    I evaluated the request with engineering team and found that there was a bug in Batch Service which only happened for User Subscription accounts using their own vNet for Windows IAAS pools.
    
    The bug fix was deployed to UK West by 6/16/2017 1400 UTC. The Issue was detected on 6/9/2017 and hotfix deployment was started 6/10/2017. 
    There was no work around for the issue and only way to unblock was the hotfix deployment by the Batch Service. 
    
    However , we have enhance our monitoring tool further to proactively detect these kind issues in future, to ensure that such issues  are caught before impacting you.
    I hope I am able to address your query. Please do let me know if you have any further query or concern I will be happy to assist.
    
    
    Thanks,
    
    So the issue was a bug and it was resolved by microsoft dev team.


    Tuesday, June 20, 2017 8:21 AM

All replies

  • Hi Bruno,

    Can you share the region you're on, your Batch account name, and the name of your pool that is failing to resize and we can take a look to see if there's some issue on our end.

    You can also file a support request through the Azure Portal (go to help + support > new support request > technical issue > pick your subscription and make sure to choose "Batch Service" and the resource that's impacted, and share your impacted pool id in the request)

    Thanks,

    -Matt

    Wednesday, June 14, 2017 5:06 PM
  • I am in North Europe

    pool id is generated_pool. I tried with a second pool wich is p1. I tried with multiple kind of vm size to be sure it is not a quota issue: standard_d2_v2 and some A.

    batch account name is https://o11backtest.northeurope.batch.azure.com

    I tried a support request through the portal as well.

    Wednesday, June 14, 2017 6:45 PM
  • It appears that your pool is requesting LowPriority VMs.  However, accounts created with "User subscription" allocation mode have a LowPriority quota of 0 (zero).

    LowPriority VMs are not currently offered for "User Subscription" accounts but please know we are pushing hard to get this enabled.

    d

    Wednesday, June 14, 2017 11:53 PM
  • Just out of curiosity:  If you are using C# (more or less the same in all languages), the CloudPool.ResizeError.Code should help understand why your resize operation timed out.  The most likely value is: Account​Core​Quota​Reached.  The ResizeError.Values collection should include additional explanatory text.

    If you have a chance, let us know what you see in these properties.

    d

    Thursday, June 15, 2017 12:23 AM
  • It appears that your pool is requesting LowPriority VMs.  However, accounts created with "User subscription" allocation mode have a LowPriority quota of 0 (zero).

    LowPriority VMs are not currently offered for "User Subscription" accounts but please know we are pushing hard to get this enabled.

    d

    This information is very interesting as it is not what I am trying to do and not the feedback I get. I tried to fix the TargetDedicated to 1 (or 2).

    This is what I see from the portal:

    Dedicated Nodes 0 -> 2

    Low priority nodes 0

    Could you confirm that I am requesting Low priority nodes and if that is the case where is my mistake (as I understand this is not what is required)

    Thursday, June 15, 2017 8:47 AM
  • I did check this value by code but I get the same as in the portal:

    Code: AllocationTimedout, Message: Desired number of dedicated nodes could not be allocated as the resize timeout was reached

    This is really not helping ...

    Thursday, June 15, 2017 4:51 PM
  • I will check the logs to see what the service is seeing.   We need to find out more about the difference between what you are seeing/asking-for and what the service is trying to achieve.
    Thursday, June 15, 2017 8:04 PM
  • generated_pool has been created 7 times since 14-jun-2017 (utc).  Each time, the pool has an autoscale formula specified and some of those set targetLowPri > 0 (I see 1?).

    I see that the pool now has an autoscale formula that explicitly sets $TargetLowPriorityNodes = 0 AND it has a currentdedicated == 1 (success!).

    I'm not sure I have access to the old pool parameters.  Customer data like that are encrypted. 

    It seems you are unblocked.

    d

    Saturday, June 17, 2017 12:49 AM
  • Yes I am unblocked but without any information about what was blocking (or what did the trick in the end).

    I suspect that some action was taken by support (I raised a ticket) or a bug was corrected on the platform...

    I hope I will not be blocked again. It is very frustrating (especially when you don't know what was the issue in the end)

    Anyway, thank you for your time.

    Sunday, June 18, 2017 3:47 PM
  • Finally I got the explanation from Microsoft support team:

    Hello Bruno, 
    
    I evaluated the request with engineering team and found that there was a bug in Batch Service which only happened for User Subscription accounts using their own vNet for Windows IAAS pools.
    
    The bug fix was deployed to UK West by 6/16/2017 1400 UTC. The Issue was detected on 6/9/2017 and hotfix deployment was started 6/10/2017. 
    There was no work around for the issue and only way to unblock was the hotfix deployment by the Batch Service. 
    
    However , we have enhance our monitoring tool further to proactively detect these kind issues in future, to ensure that such issues  are caught before impacting you.
    I hope I am able to address your query. Please do let me know if you have any further query or concern I will be happy to assist.
    
    
    Thanks,
    
    So the issue was a bug and it was resolved by microsoft dev team.


    Tuesday, June 20, 2017 8:21 AM
  • It appears that your pool is requesting LowPriority VMs.  However, accounts created with "User subscription" allocation mode have a LowPriority quota of 0 (zero).

    LowPriority VMs are not currently offered for "User Subscription" accounts but please know we are pushing hard to get this enabled.

    d

    Hello,

    Any ETA on when this will be enabled? Also, will it be possible to use GPU for processing video? I've tried to launch NV series VM's, but it looks like there was no GPU attached to it... 

    Thank you.

    Regards,

    Pedro

    Saturday, June 24, 2017 12:10 AM
  • We are also facing similar.

    Getting a resize timeout issues for a pool created in batch service.

    Region selected for batch service is southeast asia.

    Monday, March 19, 2018 6:39 AM