none
Why streamInsight 2.1 do not delete check pointing files when they are not needed? Microsoft.ComplexEventProcessing.ManagementException: There is not enough space on the disk.

    Question

  • Greetings,

    Why streamInsight 2.1 do not delete check pointing files when they are not needed?

    It is mentioned in the below MSDN page that:

    "StreamInsight deletes checkpoint files when they are no longer needed."

    http://msdn.microsoft.com/en-us/library/hh290476.aspx

    However, when I runned 46 checkpointable processes and perform checkpointing every five

    minutes 100GB space consumed in 18 hours, and this exception was thrown in addition to

    that not a single checkpoint file was deleted. The code used to perform checkpointing is after

    the exception details

    Unhandled Exception: System.AggregateException: One or more   errors occurred. --->   Microsoft.ComplexEventProcessing.ManagementException: There is not enough   space on the disk.

     ---> System.IO.IOException: There is not   enough space on the disk.
       at   Microsoft.ComplexEventProcessing.Diagnostics.Exceptions.Throw(Exception   exception)
       at   Microsoft.ComplexEventProcessing.Engine.AsyncResult`1.EndInvoke()
       at   Microsoft.ComplexEventProcessing.EmbeddedServerProxy.EndCheckpoint(IAsyncResult   asyncResult)
       --- End of inner exception stack trace ---
       at   Microsoft.ComplexEventProcessing.Diagnostics.Exceptions.Throw(Exception   exception)
       at   Microsoft.ComplexEventProcessing.EmbeddedServerProxy.EndCheckpoint(IAsyncResult   asyncResult)
       at   Microsoft.ComplexEventProcessing.CepCheckpointableProcess.<>c__DisplayClass7.<CheckpointAsync>b__3(IAsyncResult   iar)
       --- End of inner exception stack trace ---
       at System.Threading.Tasks.Task.Wait(Int32   millisecondsTimeout, CancellationToken cancellationToken)
       at System.Threading.Tasks.Task.Wait()
       at   Si.BaseRunner.<>c__DisplayClass5.<CheckpointLoop>b__3(Int64   i)
       at   System.Reactive.Linq.Observable.<>c__DisplayClass4be.<>c__DisplayClass4c0.<Timer>b__4bd(Action`1   self)
       at   System.Reactive.Concurrency.Scheduler.<>c__DisplayClass29`1.<>c__DisplayClass2b.<InvokeRec3>b__28(IScheduler   scheduler1, TState state3)
       at   System.Reactive.Concurrency.ThreadPoolScheduler.<>c__DisplayClass8`1.<Schedule>b__6(Object   _)
       at   System.Threading.ExecutionContext.RunInternal(ExecutionContext   executionContext, ContextCallback callback, Object state, Boolean   preserveSyncCtx)
       at   System.Threading.ExecutionContext.Run(ExecutionContext executionContext,   ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at   System.Threading.TimerQueueTimer.CallCallback()
       at System.Threading.TimerQueueTimer.Fire()

       at   System.Threading.TimerQueue.FireNextTimers()

    foreach (var app in server.Applications){
         foreach (var pr in app.Value.CheckpointableProcesses){
               CheckpointLoopCaller(BaseRunner.Server, pr.Value, TimeSpan.FromMinutes(5));
         }
    }
    
    /////////////////////////////////////////////////////////
    //idisposable variables for the asyncmethodcaller
    private static readonly Dictionary<string, IDisposable> _asynCheckpointLoopDictionary = new Dictionary<string, IDisposable>();
    
    public static void CheckpointLoopCaller(Server server, CepCheckpointableProcess process, TimeSpan delay){
        if (!(_asynCheckpointLoopDictionary.ContainsKey(process.Name.AbsolutePath))){
              _asynCheckpointLoopDictionary.Add(process.Name.AbsolutePath, CheckpointLoop(server, process, delay));
        }
    }
    
    //////////////////////////////////////////////////////////
    public static IDisposable CheckpointLoop(Server server, CepCheckpointableProcess process, TimeSpan delay){
        Uri queryUri = server.Enumerate(new Uri(process.Name + "/Query")).First();
    
        // Don't start checkpointing until the query is actually running.
        string queryState;
        do{
           // sleep for a second
           Thread.Sleep(TimeSpan.FromSeconds(1));
           DiagnosticView dv = server.GetDiagnosticView(queryUri);
           queryState = (string)dv[DiagnosticViewProperty.QueryState];
        } while ("Running" != queryState);
    
        return Observable.Interval(delay, Scheduler.ThreadPool).Subscribe(
               i =>{
                   var task = process.CheckpointAsync();
                   task.Wait();
                   if (task.IsFaulted){
                       Console.WriteLine("FAULTED @ ** Checkpointed! ({0}, Process Name= {1})", i, process.Name);
                       return;
                   }
                   Console.WriteLine(DateTime.Now.ToString() + "\t** Checkpointed! ({0}, Process Name= {1})", i, process.Name);
         }, (Exception e) =>
                        Console.WriteLine("Error encountered during checkpointing: {0}, Process Name= {1}", e.Message, process.Name));
    }


    MD


    • Edited by Muhammad G Saturday, June 14, 2014 2:15 PM formatting
    Saturday, June 14, 2014 2:14 PM

All replies

  • The another exception from Event Viewer log where source is .NET runtime:

    Application: Server.exe

    Framework Version: v4.0.30319

    Description: The process was terminated due to an unhandled exception.

    Exception Info: System.AggregateException

    Stack:

       at System.Threading.Tasks.Task.Wait(Int32, System.Threading.CancellationToken)

       at System.Threading.Tasks.Task.Wait()

       at Si.BaseRunner+<>c__DisplayClass5.<CheckpointLoop>b__3(Int64)

       at System.Reactive.Linq.Observable+<>c__DisplayClass4be+<>c__DisplayClass4c0.<Timer>b__4bd(System.Action`1<System.DateTimeOffset>)

       at System.Reactive.Concurrency.Scheduler+<>c__DisplayClass29`1+<>c__DisplayClass2b[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].<InvokeRec3>b__28(System.Reactive.Concurrency.IScheduler, System.__Canon)

       at System.Reactive.Concurrency.ThreadPoolScheduler+<>c__DisplayClass8`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].<Schedule>b__6(System.Object)

       at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

       at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

       at System.Threading.TimerQueueTimer.CallCallback()

       at System.Threading.TimerQueueTimer.Fire()

       at System.Threading.TimerQueue.FireNextTimers()

    Also, When I try to re-execute the server.exe the following exception happen:

    Application: Server.exe

    Framework Version: v4.0.30319

    Description: The application requested process termination through System.Environment.FailFast(string message).

    Message: Cannot access a disposed object.

    Object name: 'PageReferenceSerializerReader'.

    Stack:

       at System.Environment.FailFast(System.String)

       at Microsoft.ComplexEventProcessing.Engine.ExecutionOperatorCheckpointReader.DispatchEvents(Int32)

       at Microsoft.ComplexEventProcessing.Engine.CheckpointReaderInputStrategy.DispatchEvents(Microsoft.ComplexEventProcessing.Engine.SchedulingPolicy)

       at Microsoft.ComplexEventProcessing.StreamOS.Task.Run()

       at Microsoft.ComplexEventProcessing.StreamOS.Scheduler.Main()

       at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

       at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

       at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)

       at System.Threading.ThreadHelper.ThreadStart()

    Cheers,

    Muhammad


    MD

    Sunday, June 15, 2014 5:26 AM
  • What is the code for the queries that are being checkpointed? Do you have anything in there that alters the event duration to TimeSpan.MaxValue?

    DevBiker (aka J Sawyer)
    Microsoft MVP - Sql Server (StreamInsight)


    Ruminations of J.net


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    Monday, June 16, 2014 11:31 PM
  • Thanks a lot Devbiker

    No I don't have anything in there that alters the event duration to TimeSpan.MaxValue.

    The main query I have is:

    varstream = frome ininput

    groupe bye.id intogGroup

    fromw ingGroup.Scan<PointEvent<InputPayLoad<string>>, InputPayLoad<string>, OutputPayLoad>(() => newMyOperator())

    selectw;

    The cleansingOperator like this:

        [DataContract]

        public sealed class MyOperator : CepPointStreamOperator<InputPayLoad<string>, OutputPayLoad>

        {

            [DataMember]

            string _previousValue;

            [DataMember]

            string _previousGoodValue;

            [DataMember]

            DateTime _previousGoodValueTimeStamp;

            [DataMember]

            MyClass mc;

    ....

    My class is like this

    [Serializable]

    public class MyClass{

       [DataMemberAttribute]

       private double w;

    all variables are primitive datatype…

    Thanks, Muhammad


    MD


    • Edited by Muhammad G Wednesday, June 18, 2014 10:05 AM additional info
    Wednesday, June 18, 2014 10:00 AM
  • If that's the main query, are there any other queries? Have you done a trace to see if anything has an extended duration? My suspicion is that there is something, somewhere that has an extended timespan ... so StreamInsight keeps it around when checkpointing. Recording a trace will help you see where the durations may be getting extended/expanded.

    DevBiker (aka J Sawyer)
    Microsoft MVP - Sql Server (StreamInsight)


    Ruminations of J.net


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    Thursday, June 19, 2014 1:02 PM
  • Thanks a lot,

    "are there any other queries?" the other queries are simple projection queries they don't have extended duration.

    "Recording a trace will help you see where the durations may be getting extended/expanded." this very challenging since I have a lot of sites sending data about 70 000 sensor with reading frequency up to seconds.

    When I receive a stale (it has time issue usaully futuristic) event I make the CTI time in 2005 only if no good events recieved yet in each process. Do you think this could be the cause?

    Cheers, Muhammad


    MD

    Monday, June 23, 2014 6:04 AM
  • When I receive a stale (it has time issue usaully futuristic) event I make the CTI time in 2005 only if no good events recieved yet in each process. Do you think this could be the cause?


    I don't understand what you mean by this.

    You should be able to take a limited subset of sample input - either recorded or simulated - and run it through your queries to get a smaller recording. Or you can use the command-line trace.cmd to start and stop the recording. Yes, it'll be large but you shouldn't need any more than a few minutes' of data.


    DevBiker (aka J Sawyer)
    Microsoft MVP - Sql Server (StreamInsight)


    Ruminations of J.net


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    Monday, June 23, 2014 12:02 PM
  • Thank you very much devbiker

    I have monitor the 52 processes with total of incoming events 567,053,937 so far that I have I notice some of them have large delay comparing with others As in the below table process Q, R and S. All these processes uses the same code, same adapter, same embeded server etc.

    Time process.Value.Name queryState totalIncomingEventCount totalProducedEventCount totalOutgoingEventCount lastIncomingEventTimestamp lastProducedEventTimestamp lastOutgoingEventTimestamp totalOutgoingEventLatency totalProducedEventLatency lastProducedCtiTimestamp streamEventCount StreamMemoryIncludingEvent OperatorIndexEventCount OperatorEventMemory OperatorIndexMemory OperatorTotalScheduledCount OperatorTotalCpuUsage Delay
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_N_Good Running 3443219 3584129 3584129 7/16/2014 7:58 7/16/2014 7:58 7/16/2014 7:58 54739596932 449622163.7 7/16/2014 7:24 0 0 5645 966975 948360 16899218 1017306 0:33
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_N_Stale Running 3442645 216713 216713 7/16/2014 7:58 7/16/2014 7:58 7/16/2014 7:58 1164925857 1906186.48 7/16/2014 7:24 0 0 0 0 0 3651549 21824 0:33
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_O_Good Running 44187481 44192192 44192192 7/16/2014 7:55 7/16/2014 7:55 7/16/2014 8:00 1.20517E+13 1.68952E+12 7/16/2014 7:34 0 0 134020 23799954 22364640 125437322 13103009 0:21
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_O_Stale Running 44210020 138817 138817 7/16/2014 7:58 7/16/2014 7:58 7/16/2014 7:58 212025.008 115462.8887 7/16/2014 7:37 0 0 0 0 0 46779664 259011 0:21
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_Q_Good Running 30775253 29381506 29365244 7/16/2014 8:00 7/16/2014 7:54 7/16/2014 8:08 1.22E+15 1.20E+15 7/16/2014 7:30 16262 2761366 5 767 840 60824089 8669137 0:29
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_Q_Stale Running 30106421 411768 232759 7/16/2014 8:00 7/16/2014 7:56 7/16/2014 8:08 16457536955 152851985.5 7/15/2014 13:12 779012 142718752 3 435 312 30981238 189044 18:47
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_R_Good Running 22546711 20775096 20775096 7/16/2014 8:00 7/16/2014 7:57 7/16/2014 8:06 7.98973E+13 6.30721E+13 7/14/2014 7:36 0 0 1361 213605 228648 56465004 6315240 0:23
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_R_Stale Running 22534806 1774293 1489315 7/16/2014 8:00 7/16/2014 7:57 7/16/2014 8:08 34034022670 437296005 7/14/2014 1:55 565205 100707558 0 0 0 23075656 186949 6:04
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_S_Good Running 27543094 26890393 26771552 7/16/2014 8:00 7/16/2014 7:54 7/16/2014 8:08 8.28611E+14 8.06164E+14 7/13/2014 9:48 126920 21565308 1163 171099 147384 64853946 8394278 22:12
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_S_Stale Running 27012642 205698 205698 7/16/2014 8:00 7/16/2014 7:58 7/16/2014 8:07 13325402171 17345828.31 7/15/2014 13:07 0 0 233662 33556018 16823856 27496505 185594 18:52
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_U_Good Running 24098023 22627191 22627191 7/16/2014 8:00 7/16/2014 7:47 7/16/2014 7:47 2.93633E+14 2.74563E+14 7/16/2014 7:24 0 0 24 3768 1824 56545691 6936718 0:35
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_U_Stale Running 23970742 143532 143532 7/16/2014 8:00 7/16/2014 7:47 7/16/2014 7:47 9497920.633 285697.189 7/16/2014 7:24 0 0 1481455 223909935 106664760 24404659 221887 0:35
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_V_Good Running 23639196 23411867 23411867 7/16/2014 8:00 7/16/2014 7:58 7/16/2014 8:06 2.33258E+12 5200534351 7/16/2014 7:40 0 0 2206 348406 370608 67590599 7582929 0:20
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_V_Stale Running 23751423 271567 271567 7/16/2014 8:00 7/16/2014 8:00 7/16/2014 8:00 1.04755E+11 57283919.44 7/16/2014 7:40 0 0 242192 36686738 17433120 25802166 154088 0:20
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_W_Good Running 2094952 2228815 2228815 7/16/2014 7:57 7/16/2014 7:57 7/16/2014 7:58 17728133154 196825006.1 7/16/2014 7:36 0 0 5720 926836 933792 9470453 801039 0:21
    7/16/2014 8:08 cep:/Server/Application/Si.Cleansing/Entity/Process_W_Stale Running 2094655 141254 141254 7/16/2014 7:57 7/16/2014 7:57 7/16/2014 7:57 3499237.091 950576.7429 7/16/2014 7:36 0 0 0 0 0 2181736 13282 0:21

    I tried to minize the locks only one lock left which you suggested in this post

    http://www.devbiker.net/post/StreamInsight-Output-Adapter-Lifetime.aspx

    Do you think this have anything to do with my situation?

    Cheers, Muhammad


    MD

    Wednesday, July 16, 2014 8:08 AM
  • It may. Keep in mind that a process is the unit of scheduling for your queries. The more processes you have, the more challenges you'll have with scheduling the queries for execution.

    First, which edition of StreamInsight are you running? You will need Premium.

    Second, how many cores do you have? Full cores, not counting hyper-threading.

    Now ... why so many processes? Have you considered consolidating them at all? I notice, for example, that you have Process_U_Good and Process_U_Stale ... are these two different sinks for the same source? If so, you should definitely consolidate those into a single process (Process_U) by using Bind().With(). That will allow you to have multiple sinks in the same process. Also, if you have two sinks that use the same source but are in different processes, you'll wind up creating two instances of your source ... and this can certainly cause you to get tripped up over locks.


    DevBiker (aka J Sawyer)
    Microsoft MVP - Sql Server (StreamInsight)


    Ruminations of J.net


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    Sunday, July 20, 2014 7:57 PM