none
azure worker role 重启 RRS feed

  • 问题

  • 最近一直做azure的项目,当我们的产品部署到azure上后,worker role 又重启了,然后我查看了window的event log,出现了一个Error,现在把log贴出来:File Server Resource Manager was unable to access the following file or volume: 'E:'.  This file or volume might be locked by another application right now, or you might need to give Local System access to it.

    不知道是不是因为这个error导致的重启?求大神解答,谢谢

    2016年3月24日 3:54

全部回复

  • 我在Windows Azure 云服务中部署了worker role,但worker role会频繁重新启动。并且我的worker role并没有运行任何任务,每次重启的时候在Windows log中记录restartmanager(如图:https://social.msdn.microsoft.com/Forums/getfile/827673),大概十几分钟就会重启一次。

    各位大侠,有知道怎么解决吗?

    2016年3月17日 3:52
  • Hi,

    您是在远程登陆worker role的instance查看的Windows log记录吗?Worker role出现重启的原因一般有以下两种情况:

    • 出现未知的异常错误
    • worker role的run()方法出现执行完毕并退出

    针对您的这种情况,我建议您追加操作记录的跟踪,看是否是因为上面之一的情况引起的。如果有任何进展,欢迎来我们论坛继续讨论。

    Best Regards,

    Jambor


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.



    2016年3月17日 5:23
    版主
  • Hi,Jambor

    感谢您的回复,我是远程登陆worker role的instance查看的Windows log。首先在Windows log里面没有任何未处理异常,其次,worker role的run()方法是不会执行完毕的,也就是说我遇到的问题它不是正常退出。

    Thanks,Toby

    2016年3月17日 6:47
  • Hi,

    您给的内容不足以找出问题的真正原因,如果您的程序不包含敏感信息的话,您也可以尝试将程序上传然后将链接分享在您后面的回复中,我将尽可能的帮助您找出原因所在。 如果您有任何进展,欢迎来我们论坛分享您的心得。

    Best Regards,

    Jambor 


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    2016年3月17日 9:12
    版主
  • Hi toby Peng,

    您的问题解决了吗?如果还有什么困惑,我们可以继续一起探讨,如果已经找出问题的原因,也欢迎来我们论坛分享心得。

    Best Regards,

    Jambor


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    2016年3月18日 1:48
    版主
  • Hi,Jambor

    目前还没有解决这个问题。我们的流程是,有一个调度器发消息给 azure event hub,然后worker role中接收这些消息,根据这些消息创建线程,在线程中去请求外部网络接口取数据,然后把返回来的数据用批量插入到数据库,然后该线程结束。现在worker role在执行的时候有时候会cycling,有的时候会死掉,在cycling的时候并不会调用worker role 的onstop函数。我查看了WaHostBootstrapper的日志:

    [00003104:00003784, 2016/03/18, 06:35:01.556, INFO ] Client reported status 0.
    [00003104:00003784, 2016/03/18, 06:35:01.556, INFO ] Getting status from client WindowsAzureDiagnosticsAgent.exe (3184).
    [00003104:00003784, 2016/03/18, 06:35:01.556, INFO ] Client reported status 0.
    [00003104:00003784, 2016/03/18, 06:35:01.556, INFO ] Getting status from client WaWorkerHost.exe (2632).
    [00003104:00003784, 2016/03/18, 06:35:01.556, ERROR] Failed to connect to client WaWorkerHost.exe (2632).
    [00003104:00003784, 2016/03/18, 06:35:01.556, ERROR] <- CRuntimeClient::OnRoleStatusCallback(0x00000054E9972C40) =0x800706ba
    [00003104:00003784, 2016/03/18, 06:35:01.556, INFO ] Client process 2632 is the role host.
    [00003104:00003784, 2016/03/18, 06:35:01.556, WARN ] Failed to contact the role host process. Treat role as unhealthy.
    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Getting status from client RemoteAccessAgent.exe (764).
    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Client reported status 0.
    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Getting status from client Microsoft.VisualStudio.WindowsAzure.RemoteDebugger.Connector.exe (3320).
    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Client reported status 0.
    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Getting status from client WindowsAzureDiagnosticsAgent.exe (3184).
    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Client reported status 0.
    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Getting status from client WaWorkerHost.exe (2632).
    [00003104:00003784, 2016/03/18, 06:35:02.070, ERROR] Failed to connect to client WaWorkerHost.exe (2632).

    在加粗位置的时候worker role就死掉了,再也不能重启,这个时候我查看了,这个虚拟机的cpu使用率为60%,内存为20%,当前的线程数为60个。

    Thanks,Toby

    2016年3月18日 7:47
  • Hi,Jambor 

    我现在贴上一些代码和log:

    //worker role 的代码如下

    public override void Run()

            {

                try

                {

                    this.RunAsync(this.cancellationTokenSource.Token).Wait();

                }

                finally

                {

                    this.runCompleteEvent.Set();

                }

            }

            public override bool OnStart()

            {           

                return base.OnStart();

            }

            public override void OnStop()

            {

                this.cancellationTokenSource.Cancel();

                this.runCompleteEvent.WaitOne();

                base.OnStop();

            }

            private async Task RunAsync(CancellationToken cancellationToken)

            {

                //在这里注册EH EventProcessor

                while (!cancellationToken.IsCancellationRequested)

                {

                    Trace.TraceInformation("Working");

                    await Task.Delay(1000);

                }

            }

     

     

     

     

     

     

     

     

     

    EventProcessor

    async Task IEventProcessor.ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)

    {

               //这里从azure eventhub收到 Scheduler发来的消息

            //根据这些消息创建线程

            var t = new Thread(new ParameterizedThreadStart(DoWork));

             t.Start();

            //在线程中请求第三方api抓取数据,把数据处理完后,批量存入DB,然后该线程结束。

    }

     

    worker role执行一段时间后,WaWorkerHost cycling,在cycling的时候不会执行worker roleonStop函数,有时候WaWorkerHost会直接死掉。从我们写的日志文件中看,线程数量60多个,虚拟机的CPU使用率为60%,内存使用20%查看WaHostBootstrapper日志如下:

    [00003104:00003784, 2016/03/18, 06:35:01.556, INFO ] Client reported status 0.

    [00003104:00003784, 2016/03/18, 06:35:01.556, INFO ] Getting status from client WindowsAzureDiagnosticsAgent.exe (3184).

    [00003104:00003784, 2016/03/18, 06:35:01.556, INFO ] Client reported status 0.

    [00003104:00003784, 2016/03/18, 06:35:01.556, INFO ] Getting status from client WaWorkerHost.exe (2632).

    [00003104:00003784, 2016/03/18, 06:35:01.556, ERROR] Failed to connect to client WaWorkerHost.exe (2632).

    [00003104:00003784, 2016/03/18, 06:35:01.556, ERROR] <- CRuntimeClient::OnRoleStatusCallback(0x00000054E9972C40) =0x800706ba

    [00003104:00003784, 2016/03/18, 06:35:01.556, INFO ] Client process 2632 is the role host.

    [00003104:00003784, 2016/03/18, 06:35:01.556, WARN ] Failed to contact the role host process. Treat role as unhealthy.

    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Getting status from client RemoteAccessAgent.exe (764).

    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Client reported status 0.

    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Getting status from client Microsoft.VisualStudio.WindowsAzure.RemoteDebugger.Connector.exe (3320).

    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Client reported status 0.

    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Getting status from client WindowsAzureDiagnosticsAgent.exe (3184).

    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Client reported status 0.

    [00003104:00003784, 2016/03/18, 06:35:02.070, INFO ] Getting status from client WaWorkerHost.exe (2632).

    [00003104:00003784, 2016/03/18, 06:35:02.070, ERROR] Failed to connect to client WaWorkerHost.exe (2632).

    Thanks,Toby

    2016年3月18日 8:17
  • Hi,

    从您的描述来看,worker role线程大概开到60个的时候虚拟机会进行重启操作,如果您选择更高规格的虚拟的话,会出现什么样的情况,操作日志会怎样记录。如果扩展worker role对Azure event hub的处理能力的话是否会解决这个问题。因为这个和您的实际应用有关,比较难以还原问题,请您排除是否是因为worker role的负荷量太高而导致重启。

    Best Regards,

    Jambor


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    2016年3月22日 1:34
    版主
  • Hi,

    >>File Server Resource Manager was unable to access the following file or volume: 'E:'.

    如果我们将项目部署到Azure 的role或者创建Azure的虚拟机的时候,系统中会有两个磁盘,C和D,C盘是系统盘,它同时存在于Azure存储中,D盘只是一个临时盘,我们不建议在D盘上存储持久性数据。在我们不附加磁盘的情况下是没有E这个盘符的。我建议您注意这点。

    Best Regards,

    Jambor


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    2016年3月24日 5:45
    版主
  • Hi,Jambor

    我们是直接用vs2015 publish的,在worker role 的vm上自动会出现3个盘,C盘:worker role的盘,D盘:系统盘,E盘:放到是我们程序的程序包。我们是没有附加过任何盘的。

    Thanks,Toby

    2016年3月24日 8:08
  • Hi,

    您Worker role重启的原因找到了吗?如果没有的话我将会把帖子提交给相应的专家来处理。请给予我们回复并给出目前的问题所在。

    Best Regards,

    Jambor


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    2016年3月28日 8:22
    版主
  • Hi,Jambor

      很抱歉这么晚才回复你,这几天有事耽误了。现在的情况是worker role遇到这么两个错误后依然重启。这两个错误都被记录到windows系统log中。

    错误一:

    Application: WaWorkerHost.exe
    Framework Version: v4.0.30319
    Description: The process was terminated due to an internal error in the .NET Runtime at IP 00007FFFCFC7996B (00007FFFCFBD0000) with exit code 80131506.

    错误二:

    File Server Resource Manager was unable to access the following file or volume: 'F:'.  This file or volume might be locked by another application right now, or you might need to give Local System access to it.

    这里的F盘是部署到azure上后,azure自动给虚拟机的磁盘,里面全是应用程序的文件包。

    当遇到这两个问题时,正在执行的程序都中断了,这个时候从我们应用程序中输出的日志看,线程只有70个左右。

    Thanks,Toby

    2016年3月29日 6:56