none
Sequencing events for output adapter

    Question

  • I have the following scenario which I want to model through stream-based query semantics of StreamInsight. Given an input "Point" event, I need to :

    1. Run N simultaneous aggregation queries on it in parallel. This would typically generate M streams on the output.
    2. Send the resultant M streams to the SQL output adapter. The trick is that there is an inherent ordering associated within these M generated streams. S_1 to S_k must be updated by the SQL output adapter first and if that succeeds, send S_(k+1) to S_M to the database.

    I can do such ordering in the output adapter, but I am wondering if there is a better approach in which I can sequence these M output events within the StreamInsight query engine so that I don't have to deal with cross event correlation, buffering etc. on the adapter front.

    Another line of thinking is if there is a way to "batch" release the set of M events from Query Engine to Output adapter "if certain conditions are met". For example, output of all M aggregation engine has been generated and instead of outputting the events in the form of "stream", could they be release in a "burst". Can this be solved via "Edge" Event ?

    Thanks...

    Monday, October 03, 2011 7:55 PM

Answers

  • Trying this again, as the forum software ate the first try.  For running parallel queries against the same data feed I'd recommend using Dynamic Query Composition (blog post with some detail here).

    For the second I see two aspects in your description:

    • Event correlation and buffering.  I'd highly recommend handling these in a query; one approach to buffering for a time period and flushing is to join the results stream with a 'trigger' stream.  Would you be able to share a discrete example?

     

    • For bursting to SQL, does the burst need to be committed in a single transaction or BULK INSERT operation?  This would need to be handled by the adapter, but depending on the query semantics would need to be treated slightly differently.
    Monday, October 03, 2011 10:56 PM
  • I can't think of a good way to do this without joining the streams. You would then do the details in the output adapter.

    Also, depending on your windows for the tumbling or hopping windows, you may not even be able to ensure which events are related. Again, that depends on what your tumbling and hopping windows are and how the events are aligned in the windows.

    Now, with that said, your aggregate stream should have the same CTIs as your event stream. While you can't guarantee the order in which the output adapters will be called, you can batch them up until you get the CTI and use that as your association.


    DevBiker (aka J Sawyer)
    My Blog

    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.
    Friday, October 07, 2011 9:41 PM

All replies

  • Trying this again, as the forum software ate the first try.  For running parallel queries against the same data feed I'd recommend using Dynamic Query Composition (blog post with some detail here).

    For the second I see two aspects in your description:

    • Event correlation and buffering.  I'd highly recommend handling these in a query; one approach to buffering for a time period and flushing is to join the results stream with a 'trigger' stream.  Would you be able to share a discrete example?

     

    • For bursting to SQL, does the burst need to be committed in a single transaction or BULK INSERT operation?  This would need to be handled by the adapter, but depending on the query semantics would need to be treated slightly differently.
    Monday, October 03, 2011 10:56 PM
  • Thanks for Responding.

    The discrete example is as following: For an input stream S, the aggregation needs to find unique values of certain dimensions and also aggregate the input events to compute group statistics like Count and Mean. The SQL output adapter is required to insert the unique dimension into a SQL table. Once this is done, the output adapter should update the group statistics. The dependency between the two steps is that, the unique dimensions should first be inserted into the SQL table and only then the group statistics associated with those dimension should be inserted. Failure to order this sequence would cause SQL stored procedure for the later to fail as the dimensions won't be available in the SQL table.

    Considering the above scenario, I was thinking of the two alternatives I captured in the problem description. Hope it gives more context into the problem.

    Monday, October 03, 2011 11:34 PM
  • So ... let me see if I understand this.

    You have an aggregate query and a raw source (detail) query. You want the results of both to be stored in SQL. You want the detail to be tied to the aggregate. Because of this, you need to make sure that the aggregate for a time window is inserted before the details so that there is on FK constraint violation. Is this correct?


    DevBiker (aka J Sawyer)
    My Blog

    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.
    Tuesday, October 04, 2011 10:49 PM
  • Hi DevBiker,

    You summarized it correctly. This is exactly the dependency I want to handle within the stream insight query engine, rather than on the output adapter to avoid handling out-of-order streams.

    Wednesday, October 05, 2011 8:51 PM
  • I can't think of a good way to do this without joining the streams. You would then do the details in the output adapter.

    Also, depending on your windows for the tumbling or hopping windows, you may not even be able to ensure which events are related. Again, that depends on what your tumbling and hopping windows are and how the events are aligned in the windows.

    Now, with that said, your aggregate stream should have the same CTIs as your event stream. While you can't guarantee the order in which the output adapters will be called, you can batch them up until you get the CTI and use that as your association.


    DevBiker (aka J Sawyer)
    My Blog

    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.
    Friday, October 07, 2011 9:41 PM