none
What's going wrong with my worker role?

    Question

  • I've got a worker role running which essentially drags data from a web service, processes it and sticks extracts in blobs and a sql azure database.  It's reasonably high throughput, but low CPU, so I've got it running on an extra small.  In 3 weeks it's generated 9GB of SQL tables and 6GB of blobs.

    From manual inspection of the Performance counters, the CPU runs around 50% and there is always available memory.

    My problem is that it keeps crashing or hanging and I've no idea what or why.

    Added to that, is that I can't seem to get trace logs off the machine.  in this post (http://social.msdn.microsoft.com/Forums/en-US/windowsazuretroubleshooting/thread/3bf58157-84c3-45df-bacd-f97da940de9e) I describe how I set up my diagnostics and thought I'd found what I was doing wrong (I was looking in the wrong storage account).  However, I'm still not getting trace.

    I get performance counter and event log tables, but no tracing.

    Bizzarely, the account in the config in azure does not match the account I specify in the config on VS2010.

    I've set the system up to log selected events and exceptions in SQL azure.  I've also set up the system to email me on salient events (e.g. start and stop and exceptions).

    I sometimes get a stop email and sometimes do not.

    The failures are erratic.  It can run for 4 or 5 days with no problems and then restart 4 times in a day.

    Any suggestions or pointers would be much appreciated.

    Iain


    Iain Downs

    Tuesday, June 19, 2012 8:09 AM

Answers

All replies

  • I've just tried to log on with remote desktop (again) and this seems to have triggered an exception.  The Azure Portal, shows 'Stabilizing role ... unhandled exception'.

    How do I find what the exception is?

    (This has now happened twice in a row - I either can't get through on remote desktop or the service fails shortly after connecting)

    I do have an unhandled exception handler in my code which should log the error and email me, but it never gets called (well it never logs anything or sends me an email!). 

    The event log has a few application exceptions in, though far less than the crashes.  The last one claims it can't find or open a file (or can't enumerate a dfs share).

    Iain


    Iain Downs

    Tuesday, June 19, 2012 8:38 AM
  • Hi,

    Try to use a TextWriterTraceListener (if TextWriterTraceListener still doesn’t work, manually write some code to write trace information to a file). Then RDP to the VM to check the log file and get information to troubleshoot. After the main problem is resolved we can work further to see why the trace doesn’t work.


    Please mark the replies as answers if they help or unmark if not. If you have any feedback about my replies, please contact msdnmg@microsoft.com Microsoft One Code Framework

    Wednesday, June 20, 2012 7:34 AM
    Moderator
  • Good advice.  Presumably I need to use local storage (or whatever it's called) rather than writing to c:\LogFile.txt?

    I've added some more DB based tracing and also changed the instance type from small to extra small, so I'll see if that runs for a while and follow your advice if it doesn't "just work".

    Oh one other thing on the Diagnostics.  I'm enabling Crash Dumps (small not full), but I never see any in the CrashDump directory in the VM.  I'm not sure if this is a diagnostics setting or if the damn thing isn't actually crashing....


    Iain Downs

    Wednesday, June 20, 2012 8:00 AM
  • there are few different Trace Listeners out there which allow direct logging to a Table or Queue on Azure which might be helpful in your case and would not require collecting logs from running instances

    http://blog.smarx.com/posts/lightweight-tracing-to-windows-azure-tables

    http://cloudlogging.codeplex.com/

    Wednesday, June 20, 2012 8:27 PM
  • Thanks for the smarx reference.  I've implemented this and I'm now getting trace messages in an Azure Table.

    I'll keep you posted on what (if anything!) I find out.


    Iain Downs

    Thursday, June 21, 2012 9:05 AM
  • OK.  The instance has restarted after 3 hours of running.  At least this time I got a shutdown message!

    The traces I'm getting indicate an OutofMemoryException.  In fact I get 3, probably due to there being several threads running.  The first two are from the application, being rasied when the SQL connection is opened.  The last one appears to be from teh thread pool which manages connections, but has no direct application root.

    I seem to have lost whatever performance monitors diagnostics I once had so I can't easiliy see where and when this happens.

    How can I find and download Performance counter tables from the instance? (they are in C:\REsources\Directory\...\Monitor\Tables from what I can see).

    I've set some old fashioned performance counters running and will check from time to time, but from watching task manager in RDP it's not ovious that it's leaking memory at any kind of rate.

    Is there any other reason why a Sql Azure connection should throw a memory leak?

    Iain


    Iain Downs

    Thursday, June 21, 2012 2:11 PM
  • Iain - this may or may not be helpful - but you know that SQL Azure throws transient errors whenever it feels like it, right?  You have to wrap every single SQL call with a retry policy delegate.  You'll need this DLL:

    Microsoft.Practices.EnterpriseLibrary.WindowsAzure.TransientFaultHandling

    In my view, this should be done automatically under the hood for developers, but according to the Microsoft ADO.NET team, this bug/feature is not slated to be implemented for another year or more.



    Thursday, June 21, 2012 5:11 PM
  • Thanks, Matt.  I believe I have this in place.  I have a line in each of my EntityFramework access blocks

    Microsoft.Practices.TransientFaultHandling.RetryPolicy policy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>(MaxRetries, TimeSpan.FromMilliseconds(DelayMS));

    Which does this.

    I'd better go now - there's a thunderstorm over Harrogate and I'd better power off my machines :)

    Iain


    Iain Downs

    Thursday, June 21, 2012 5:21 PM
  • Cool - good to see some folks from the UK here.  You've also added all of the delegates using that retry policy, right (retryPolicy.ExecuteAction)?  Like this:

            public static bool TestDatabaseConnectivity(out string errorMessage)
            {
                errorMessage = "";
                RetryPolicy<SqlAzureTransientErrorDetectionStrategy> retryPolicy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>(incrementalRetryStrategy);
                try
                {
                    using (MyDB db = new MyDB())
                    {
                        User u2 = retryPolicy.ExecuteAction<User>(() => db.Users.FirstOrDefault(u => u.Id == 1));
                        if (u2 == null) return false;
                        return true;
                    }
                }
                catch (Exception exc)
                {
                    errorMessage = exc.ToString();
                    return false;
                }
    
            }

    Thursday, June 21, 2012 5:26 PM
  • Ah.

    I'm using EntityFramework and found my usage somewhere on the interweb.

    It may be that I've misunderstood the usage of the RetryPolicy.  I thought that you simply created the retry policy within the scope of a Context and dotnet magic made it work (rather like TransactionScope).

    If I now understand things a little better I've got to wrap all my calls into the context in a RetryPolicy.ExecuteAction<thing>.

    If I am therefore not protected against bad Azure connections, I suppose if there is a flurry of errors, then the connection pool could lock up and claim to be out of memory.

    Does that sound right?  Certainly, monitoring the role from Remote Desktop, there is no real evidence of significant memory leaks...

    Iain


    Iain Downs

    Thursday, June 21, 2012 6:55 PM
  • Iain,

    You are not protected.  You need to wrap everything with that ExecuteAction delegate.  It's a huge pain if you've already written your data layer, which is what happened to me :(

    I think I know what your issue might be.  When you deploy via visual studio, do you have IntelliTrace and Profiling disabled?  If not, they should be.  I was experiencing OutOfMemoryErrors from having those enabled.

    Matt

    Thursday, June 21, 2012 9:16 PM
  • Oh, Joy.

    I take it I have to wrap every atomic operation in one of these.  It's not enough that wrap calls to methods?  I thought so.

    I AM using the debug version, but I do not have IntelliTrace or Profiling enabled.

    Also, I've been watching the VM box via remote desktop since yesterday,  In the last 15 hours or so the memory has not really risen at all (it's wandered between 600 and 620 MB).  So it looks as if there is some critical event which triggers out of memory rather than normal operation.

    I'll fix my code and see if that helps...

    Iain


    Iain Downs

    Friday, June 22, 2012 6:59 AM
  • You've got it Iain - every atomic operation has to be wrapped.  I even spoke with "the Scotts" about this one; they both agreed the current solution is not ideal.
    Friday, June 22, 2012 2:59 PM
  • Right.  I've done that and it's not obviously failed.

    Let's see how it works out.


    Iain Downs

    Friday, June 22, 2012 4:29 PM
  • If you haven't done so already, I'd suggest logging everything and then checking all logs.  Make sure the connection string is correct in your cloud configuration file.  This works with a web role.  Here's a quick code snippet for doing that:

            public override bool OnStart()
            {
                ThreadPool.QueueUserWorkItem(new WaitCallback(StartDiagnostics), new object());
                return base.OnStart();
            }
            public void StartDiagnostics(object o)
            {
                try
                {
                    int transferPeriodMinutes = 2;
                    var config = DiagnosticMonitor.GetDefaultInitialConfiguration();
                    config.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(transferPeriodMinutes);
                    config.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
                    config.Logs.BufferQuotaInMB = 10;
                    config.DiagnosticInfrastructureLogs.ScheduledTransferPeriod = TimeSpan.FromMinutes(transferPeriodMinutes);
                    config.DiagnosticInfrastructureLogs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
                    config.DiagnosticInfrastructureLogs.BufferQuotaInMB = 10;
                    config.WindowsEventLog.DataSources.Add("Application!*");
                    config.WindowsEventLog.DataSources.Add("System!*");
                    config.WindowsEventLog.ScheduledTransferPeriod = TimeSpan.FromMinutes(2);
                    config.WindowsEventLog.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
                    config.WindowsEventLog.BufferQuotaInMB = 10;
                    config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration()
                    {
                        CounterSpecifier = @"\ASP.NET Applications(__Total__)\Requests/Sec",
                        SampleRate = TimeSpan.FromSeconds(30),
                    });
                    config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration()
                    {
                        CounterSpecifier = @"\ASP.NET v4.0.30319\Requests Queued",
                        SampleRate = TimeSpan.FromSeconds(30),
                    });
                    config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration()
                    {
                        CounterSpecifier = @"\ASP.NET v4.0.30319\Requests Rejected",
                        SampleRate = TimeSpan.FromSeconds(30),
                    });
                    config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration()
                    {
                        CounterSpecifier = @"\ASP.NET v4.0.30319\Request Execution Time",
                        SampleRate = TimeSpan.FromSeconds(30),
                    });
                    config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration()
                    {
                        CounterSpecifier = @"\ASP.NET v4.0.30319\Request Wait Time",
                        SampleRate = TimeSpan.FromSeconds(30),
                    });
                    config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration()
                    {
                        CounterSpecifier = @"\ASP.NET\Application Restarts",
                        SampleRate = TimeSpan.FromSeconds(30),
                    });
                    config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration()
                    {
                        CounterSpecifier = @"\.NET CLR Exceptions(_Global_)\# Exceps Thrown / sec",
                        SampleRate = TimeSpan.FromSeconds(30),
                    });
                    config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration()
                    {
                        CounterSpecifier = @"\Processor(_Total)\% Processor Time",
                        SampleRate = TimeSpan.FromSeconds(30),
                    });
                    config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration()
                    {
                        CounterSpecifier = @"\Memory\Available Bytes",
                        SampleRate = TimeSpan.FromSeconds(30),
                    });
                    config.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(transferPeriodMinutes);
                    config.PerformanceCounters.BufferQuotaInMB = 10;
                    Microsoft.WindowsAzure.Diagnostics.CrashDumps.EnableCollection(true);
                    DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config);
                }
                catch (Exception exc)
                {
                    // log this exception if you want
                }
            }


    Friday, June 22, 2012 4:34 PM
  • That's very close to what I have.  The only thing missing (apart from the detailed counters) is the BufferQuota, but I don' think that's critical is it?

    OF course, I don't get any logs, so I must be doing something wrong.

    One thing that troubles me is that the storage account in the Management Portal is NOT the one I set in the development environment (I have two accounts set up - I'm not sure exactly why).

    If I change the diagnostics connection string using 'Configure' in the management portal to the one I want it to be, However, despite the amendment (edit rather than upload), it stubbronly refuses to cahnge.


    Iain Downs

    Friday, June 22, 2012 4:52 PM
  • I'm not sure if the buffer quota is critical or not - the defaults may suffice.  I added them to be sure it wouldn't cause a memory issue.

    If you aren't getting any logs, I'd start solving that first if I were you.  The diagnostics/logging in Azure is tricky and can trip you up.

    Make sure you set this in your ServiceConfiguration.cloud.csfg file if you use my code:

    <Setting name="Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString" value="DefaultEndpointsProtocol=https;AccountName=youraccountnamehere;AccountKey=youraccountkeyhere" />


    I hope this note was helpful - if so, can you click "Mark as Answer" or "Vote as Helpful?" Thanks! Matt.

    Friday, June 22, 2012 4:55 PM
  • Yes I have that and have had all along.

    The thread started out trying to find why my diagnostic logs weren't getting uploaded to Azure and whilst I've progressed with the underlying issue (crashes), the logging remains defiant.


    Iain Downs

    Friday, June 22, 2012 5:03 PM
  • OK.  After adding a lot more diagnostics and trace I can see what's happening.

    Following Matt's advice above I've corrected my implementation of the RetryPolicy in Azure.

    What I seem to be seeing is that from time to time the instance just stops being able to talk to SQL Azure.  This despite the RetryPolicy.  I get timeouts.  When I do, as often as not I get timeouts on several of the threads which are active.   Each thread btw has a separate EntityFramework context.

    Mostly the timeouts happen on a simple single record retrieval or save.  I've seen both connection timeouts (by far the most common) and Command Timeouts.

    I've got the MaxRetries for the RetryPolicy set to 10 and the delay to 100mS.  Is this sensible or should I increase one of these?

    Is there any way I can log the retries in hte Policy to see if it is actually retrying?

    Any advice most gratefully received!

    Iain


    Iain Downs

    Friday, June 29, 2012 7:34 PM
  • The RetryPolicy object is not thread-safe, just so you know.  That's the first thought I have reading this.

    Can you post an entire code snippet that shows a single record retrieval that you are using?  I think that you probably have something wrong with your implementation; SQL Azure pretty much works like a charm for me.

    Also - I assume your worker role is being hosted in the same data center as your SQL Azure db, yes?

    Matt

    Friday, June 29, 2012 7:44 PM
  • Shouldn't you be down the pub, Matt?  :)

    I know the RetryPolicy object is not thread safe which is why I mentioned that i have separate contexts (and policies) per thread.

    here is some code (a bit abbreviated)

    The top part is running in a loop which is controlled by a ManualResetEvent.  When there's an 'item' to be processed it's ID is put in a local variable (_request) and the ResetEvent signalled.  The snippet below runs, retrieves the record and processes it. 

    The most common error happens in the line Item item = policy.Execute...

    using (MyEntities se = new MyEntities())
    {
      se.CommandTimeout = 120;
      Microsoft.Practices.TransientFaultHandling.RetryPolicy policy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>(MaxRetries, TimeSpan.FromMilliseconds(DelayMS));
      Search s = RetrieveAndProcessItem(se, policy);
    	// do something with search
    }
    
    
    Search RetrieveAndProcessItem(MyEntities context, Microsoft.Practices.TransientFaultHandling.RetryPolicy policy)
    {
      Item item = policy.ExecuteAction<Item>(()=> (from xx in context.Items where xx.ID == _Request.ID select xx).FirstOrDefault());
      return MakeASearch(item);
    }
    

    (The names of the objects have been changed, but the code is essentially the same)

    I'd be very happy to be put right.

    Iain


    Iain Downs

    Friday, June 29, 2012 8:51 PM
  • Iain, I'm actually in Cambridge, MA, USA, not Cambridge UK :)  Although I did live in London for four years, so I consider myself partially English at heart! 

    A few thoughts on this:

    1.  I'm not sure you can pass a context as a parameter like you've done in your RetrieveAndProcessItem.  It's probably possible, but I've never done it myself.  I just don't know what the implications are of doing that.

    1b.  Try refactoring your code to just get the POCO objects you need from your context, then let the code leave your "using" block (thus disposing of the context), and then pass those POCO objects to another method of your choosing.  I'm a little concerned that passing your context around like you've done has unforeseen consequences.  Maybe someone else could enlighten us here on that.

    2.  This probably has no consequences, but I think the way you've written your query could be made more readable:

    change:

    Item item = policy.ExecuteAction<Item>(()=> (from xx in context.Items where xx.ID == _Request.ID select xx).FirstOrDefault());
      return MakeASearch(item);

    to:

      Item item = policy.ExecuteAction<Item>(()=> se.Items.FirstOrDefault(x=>x.ID==someId));



    In conclusion,

    try refactoring it all like this:

    string somethingINeed = string.Empty;
    
    using (MyEntities se = new MyEntities())
    {
      se.CommandTimeout = 120; // this is not needed, 120 is the default I think.
      Microsoft.Practices.TransientFaultHandling.RetryPolicy policy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>(MaxRetries, TimeSpan.FromMilliseconds(DelayMS));
      Item item = policy.ExecuteAction<Item>(()=> se.Items.FirstOrDefault(x=>x.ID==someId));
    
    if (item!=null) somethingINeed = item.Name; // get the POCO objects you  need
    
    } // let the context clean itself up
    
    DoSomethingElse(somethingINeed); // pass your POCO objects to another method




    Saturday, June 30, 2012 6:06 PM
  • Hey, Cambridge MA is nice too - I was in Boston for a conference a few years ago...

    I've written a few non-Azure Entity Framework systems and pass contexts around freely.  I see no reason why a context (or a RetryPolicy) should be difference from any other object.

    The thing is that the the evidence is that every so often the connection from the VM to the SQL Azure fail and the RetryPolicy isn't enough. 

    I've found out how to get notification when the RetryPolicy object retries, so I'll log that and see if it give me insight.

    Thanks!

    Iain


    Iain Downs

    Sunday, July 01, 2012 12:01 PM
  • OK.  It's been a few weeks and I'm still tearing my hair out.  Every day or three the service just stops. 

    As a reminder this is a fairly straightforward role which grabs data from a web service and pushes results out to a sql azure database.  To date, I've been unable to get trace logging working the azure way and so I've been using the smarx.TableTraceListener which logs in real time to the azure log table.

    What I think is happening is that every so often the network just fails.  Exceptions are thrown and I handle them and Trace the Error.  I think in some of these cases the smarx code tries to write to the table, fails (because of the lack of network) and throws an exception inside the tracing code.

    I do have an unhandled exception handler, which may or may not be being called.  If it then the trace statements in it may be throwing things.

    I've taken the smarx listener out for the time being, but I still seem to be struggling with the Azure logs.

    Prior to this latest update I looked at the Azure Trace log in VS2010 and found it had recent traces in.

    I've deleted all the WAD* logs in the container and restarted the role.  I've got the WADPerformanceCounters table recreated and with content in, but the Event Log table and the Trace table are not created.  My On start calls the code below to configure the logs. 

     void ConfigureDiagnostics()
            {
                Trace.TraceInformation("Starting diagnostics");
                var config = DiagnosticMonitor.GetDefaultInitialConfiguration();
                config.WindowsEventLog.DataSources.Add("Application!*");
                config.WindowsEventLog.ScheduledTransferLogLevelFilter = LogLevel.Information;
                config.WindowsEventLog.ScheduledTransferPeriod = System.TimeSpan.FromMinutes(1);
    
                config.PerformanceCounters.DataSources.Add(
                    new PerformanceCounterConfiguration()
                {
                    CounterSpecifier = @"\Processor(_Total)\% Processor Time",
                    SampleRate = TimeSpan.FromSeconds(5)
                });
                config.PerformanceCounters.DataSources.Add(
                    new PerformanceCounterConfiguration()
                    {
                        CounterSpecifier = @"\Memory\# Total committed Bytes",
                        SampleRate = TimeSpan.FromSeconds(5)
                    });
                config.PerformanceCounters.DataSources.Add(
                    new PerformanceCounterConfiguration()
                    {
                        CounterSpecifier = @"\Memory\Available Mbytes",
                        SampleRate = TimeSpan.FromSeconds(5)
                    });
                config.PerformanceCounters.ScheduledTransferPeriod = System.TimeSpan.FromMinutes(1);
    
                config.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
                config.Logs.ScheduledTransferPeriod = System.TimeSpan.FromMinutes(1);
    
                Microsoft.WindowsAzure.Diagnostics.CrashDumps.EnableCollection(false);
    
                DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config);
                Log("Diagnostics", "Should be started");
                Trace.TraceInformation("Diagnostics should be started");
            }

    As far as I can see it should be transferring trace and event log stuff every minute. But it's not. It IS transferring the performance logs.

    My app.config contains

      <system.diagnostics>
        <trace>
          <listeners>
            <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener, Microsoft.WindowsAzure.Diagnostics, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" name="AzureDiagnostics">
              <filter type="" />
            </add>
           </listeners>
        </trace>
      </system.diagnostics>

    The Connection string must surely be set correctly or the perf counters would not be saved.

    Can anyone suggest what's going wrong!

    Thanks

    Iain


    Iain Downs

    Tuesday, July 17, 2012 8:29 AM
  • Can you post your service configuration file for the Cloud (remove sensitive info) ?
    Tuesday, July 17, 2012 2:25 PM
  • Also Iain - you should upgrade to SDK 1.7.  Google around and you should be able to find the link.  Then manually look at all of your project references and confirm they are in the folder for the SDK 1.7.  Also, then change your TraceListener reference to version 1.7: 

    <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener, Microsoft.WindowsAzure.Diagnostics, Version=1.7.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" name="AzureDiagnostics">

    I had a problem with diagnostics until I fixed the above.  See my thread here: http://social.msdn.microsoft.com/Forums/en-US/windowsazuretroubleshooting/thread/29125fcd-f2fa-42e7-abf9-c1707dbd6239


    Tuesday, July 17, 2012 3:07 PM
  • <ServiceConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" serviceName="" osFamily="1" osVersion="*" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration">
      <Role name="GetSentiment">
        <ConfigurationSettings>
          <Setting name="MailRecipients" value="iain@idcl.co.uk; larry@somewhere.com" />
          <Setting name="MaxSecondsToNextPoll" value="43200" />
          <Setting name="Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString" value="DefaultEndpointsProtocol=https;AccountName=brandsentiment;AccountKey=secret" />
          <Setting name="Microsoft.WindowsAzure.Plugins.RemoteAccess.AccountEncryptedPassword" value="secret" />
          <Setting name="Microsoft.WindowsAzure.Plugins.RemoteAccess.AccountExpiration" value="2013-08-28T23:59:59.0000000+01:00" />
          <Setting name="Microsoft.WindowsAzure.Plugins.RemoteAccess.AccountUsername" value="Iain" />
          <Setting name="Microsoft.WindowsAzure.Plugins.RemoteAccess.Enabled" value="true" />
          <Setting name="Microsoft.WindowsAzure.Plugins.RemoteForwarder.Enabled" value="true" />
          <Setting name="MinSecondsToNextPoll" value="150" />
          <Setting name="SentimentDB" value="metadata=res://*/;provider=System.Data.SqlClient;provider connection string=';data source=secret.database.windows.net;initial catalog=Sentiment;SecretLogon;multipleactiveresultsets=True';" />
          <Setting name="SentimentStore" value="DefaultEndpointsProtocol=https;AccountName=brandsentiment;AccountKey=secret" />
          <Setting name="Threads" value="10" />
        </ConfigurationSettings>
        <Instances count="1" />
        <Certificates>
          <Certificate name="Microsoft.WindowsAzure.Plugins.RemoteAccess.PasswordEncryption" thumbprint="Secret" thumbprintAlgorithm="sha1" />
        </Certificates>
      </Role>
    </ServiceConfiguration>

    Thanks Matt. Perf counters still being updated, but nothing else.

    Iain


    Iain Downs

    Tuesday, July 17, 2012 3:17 PM
  • Looks fine.  I'd double check the references and ensure they are the latest version for SDK 1.7, update the TraceListener DLL reference as I mentioned.  If all that still fails, log a question with Azure support.  They are quite good and skilled and finding the answer.
    Tuesday, July 17, 2012 3:43 PM
  • Also check, when you right-click your cloud project and select "Publish" - check that in the "Advanced Settings" tab, the same storage account you are using for diagnostics is selected.  This is critical, and you'll see strange behavior if you have the incorrect storage account selected there.
    Wednesday, July 18, 2012 3:58 PM
  • Yee haa!

    I'm not sure I'm out of the woods yet, but at least I've got logging.

    All I needed to do was to upgrade to 1.7.  It wasn't quite as plain as I'd hoped (documented at http://social.msdn.microsoft.com/Forums/en/windowsazuretroubleshooting/thread/16ab3646-3587-451f-b66a-5bf738ed4f28), but I now have Azure diagnostics running (too much actually, need to turn event log level down).

    Hopefully, the underlying problems are either bypassed, fixed or handled, but I will keep you posted.

    Matt - really, really appreciate your help on this.

    Iain


    Iain Downs

    Thursday, July 19, 2012 8:35 AM
  • Great!  Glad you got it sorted.

    Cheers,

    Matt


    Thursday, July 19, 2012 3:38 PM
  • Hi Iain,

    I am also seeing similar behaviour with my web role, it is randomly failing to connect to SQL Azure (timing out after 30 seconds). This only seems to happen when I swap the VIP of my deployment.

    Did upgrading to Azure SDK 1.7 actually resolve your original issue?

    Charles

    Thursday, August 23, 2012 6:39 AM
  • The problem with logging was resolved by upgrading to V1.7.

    In terms of connection to SQL Azure, I suggest two steps.  One is to implement the Retry Policy as described earlier in this thread.

    The second is to handle and retry SQL operations anyway.  I'm still seeing exceptions raised in my role despite the retry policy.  Only very occasionally - probably 1 in 100,000 updates, but they do happen.

    If you are running a person facing site, then I would probably just have an error handling page which suggests they retry.  If you have an automatic process driving it then you probably need some kind of watchdog.

    Iain


    Iain Downs

    Sunday, September 02, 2012 9:28 AM