locked
StorageClientException aborts large query processing, table storage RRS feed

  • Question

  • Hi,

    My issue is similar to "StorageClientException azure table query" topic, but relates to Azure SDK 1.7 and consequences are different.

    I have a data processing service running on on-premises server that streams millions of entities from table storage (selection is done by partition/row key, using CloudTableQuery<T>).

    What I'm observing is after successful processing of a number of millions of rows it may suddenly get the following exception (stack trace from bug repro project) without a chance to retry failed request and continue processing:

    Microsoft.WindowsAzure.StorageClient.StorageClientException was unhandled
      HelpLink=http://go.microsoft.com/fwlink/?LinkID=182765
      Message=The operation has exceeded the default maximum time allowed for Windows Azure Table service operations.
      Source=Microsoft.WindowsAzure.StorageClient
      StackTrace:
           at Microsoft.WindowsAzure.StorageClient.Tasks.Task`1.get_Result()
           at Microsoft.WindowsAzure.StorageClient.Tasks.Task`1.ExecuteAndWait()
           at Microsoft.WindowsAzure.StorageClient.TaskImplHelper.ExecuteImplWithRetry[T](Func`2 impl, RetryPolicy policy)
           at Microsoft.WindowsAzure.StorageClient.CommonUtils.<LazyEnumerateSegmented>d__0`1.MoveNext()
           at StorageClientExceptionReproduction.Program.Main(String[] args) in D:\Projects\Azure repro\StorageClientExceptionReproduction\Program.cs:line 37
           at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
           at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
           at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
           at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
           at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
           at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
           at System.Threading.ThreadHelper.ThreadStart()
      InnerException: 
    

    There might be spikes of huge network and CPU activity on that server so as long as there is a RetryPolicy set I assume it should be fine to have one or two failed requests due to timeout. But what I see is that this exception causes full segmented processing to fail.

    I've published a reproduction of this bug on GitHub that shows (with the help of Fiddler) that in case when data channel is not stable we get an exception that breaks whole processing without any attempt to retry failed request.
    I use Fiddler to pause response for 2 mins (simulating slow response) that causes StorageClientException to be thrown. Notice that even one slow response will cause the (potentially large) chain of segmented results to fail, without any chance to resume processing from the point of failure.

    Thanks,
    Sergei

    Tuesday, July 31, 2012 1:18 PM

Answers

All replies

  • Created a pull request that fixes this issue (https://github.com/WindowsAzure/azure-sdk-for-net/pull/105).

    Basically the only thing that needs to be changed is to throw StorageServerException instead of StorageClientException.

    • Proposed as answer by Maud L Thursday, August 2, 2012 3:31 AM
    • Marked as answer by Sergei Almazov Thursday, August 2, 2012 2:15 PM
    Wednesday, August 1, 2012 1:53 PM
  • My idea is keep the last entity's partitionkey and rowkey in foreach loop, and use it to resume the query manually.

    Of course your fix is much better.

    Thursday, August 2, 2012 3:33 AM