Friday, April 09, 2010 9:21 AM
Did you considerate the situation of a failover case?
That is, all the events pending at the server level (due to a CTI waiting, for exemple) will be lost in case of a unattended service failover.
I am thinking to a manner to hydrate (Biztalk/WF manner) the pending events in order to be possibly recovered at the service restart.
Do we have the possibility to plug somehow into the CEP server for this purpose, at this moment?
Tuesday, April 13, 2010 9:59 PM
There are two methods by which you can implement failover with StreamInsight v1:
Option 1: Build an active-active server configuration - where you have two instances of StreamInsight working on the same set of events multicasted from an input source. The multicaster will be outside the system, providing events into identical input adapters installed on the two servers, with the adapter instances being invoked by the same queries installed on the two servers. On the output side, you could have a de-multiplexer that receives output events from the output adapters and eliminates duplicates.
Obviously, this requires some level of design deliberation to ensure that the output events have some sort of a key structure to help you with the de-duplication. However, on the input side, the beauty of StreamInsight is that it performs its temporal operations based on APPLICATION timestamps - so the system is forgiving of events that can arrive slightly out of synch between the two systems. You can control how much a query - or more specifically, aggregation operations in a query, have to "wait" for out of order events, but deciding how frequently you issue the CTI events.
With such an active-active configuration, you also have to design your overall system such that at least one server is running all the time (site failures, node failures, and network failures).
Option 2: Back up or, in other words, support your primary server with an online log. In this scenario, you'd multicast your input events from the source into your query AND into an online log (implemented preferably as a ring bufferred log with LRU age-out). If the system goes down, your online log would still be recording the events. When the system recovers back up, you provide the events from this online log as a second input (in other words, you write your StreamInsight query as accepting TWO inputs - one from your main input stream, and another from this online log), and the query will effectively allow the events from the online log "to catch up". I have tried to show this in a picture below:
------ Input Stream ----------------------------> | I/P Adapter +- Query - O/P Adapter | ------Output Stream ---->
+---> Online Log ------> | I/P Adapter +
The advantages to this approach are:
1. You have just one server.
2. The ability of the query to "catch up" - based on how you specify the CTIs - enables you to immediately start the server and accept
NEW events, while these older events in the log get "caught up". If the downtime is minimal, only the events that were collected during the
downtime need to be caught up.
Now, we will grant that this method becomes practical when you are dealing with reasonable window sizes of a few minutes, hours to few days, and you have event rates of in the order of 1000s per second. We will also agree that this is not a case of true blue Failover - depending on the time for the system to recover, events in the log that lag the current event windows (after recovery) substantially could potentially be not considered in the windowed computation. But you'd be happy to hear that some of our production customers have tried this approach, and it has worked for them - i.e. it has met their failover requirements.
Option 3: True Failover - as in Failover Clustering - with support for automatic transfer of query state from a primary to a standby. We are absolutely aware that the V1 release of our product does not support it - this is definitely on our radar.
If the above solutions don't work for you, and you are a customer dealing with very high event rates and many-9 (as in 3-9's, four-9's, 5-9's) availability requirements, we'd be interested in talking to you.
Thank you for your interest in StreamInsight.
Thursday, April 15, 2010 8:46 AM
That is a good summary of the options, but it only addresses parts of the issues with each solution.
Lets look at a case where I do a query that looks at 24 hours worth of data, with a CTI every hour.
Option 1 works great as long as the two servers are working. Once server 1 goes down, and then comes back up, I need to "feed" that server at least 24 worth of data before it can make any real decision. The trick is to know when server 1 is "primed" again, that I can rely on its output. In that timeframe, I have no failover server, as server 2 is the only correct server, and server 1 is still ramping up.
A similar problem is seen with option 2. It's not enough to just say that I will cache all the inputs that the Server missed while it was down, and pass them again into it, as it may need some older data to do its processing. As an input adapter developer, you have to know how much historical data your server needs, and pull that out of some data store (say past 24 hours), and pass those in as well. When you do that, you break the encapsulation of your query and adapter, letting the understanding of the query "leak" to the adapter.
Now, if your events are fast, and so are your CTIs, and your query horizons are short as well (in the seconds or minutes tops), you may be able to get away with these schemes, but the real solution (your solution 3) is really needed.
Thursday, April 15, 2010 9:35 PM
Thank you both for your interventions.
With this ocasion I think we opened a pretty hot subject for a real enterprise environment implementation.
I may add meanwhile my 2 supplementary cents, as solutions for simpler cases:
Option 4. If you have BizTalk in the configuration (which is our case), you may include the CEP query treatment into an orchestration circuit (with messages as CEP events), and coordinate CEP input and output in a manner that you don't remove the message from the message box until you haven't received a corresponding CEP output. This way, you profit of BizTalk message box's persistence and availability.
(actually it's the idea we think to use as solution for our case)
Option 5. If possible - it's on you, Ram, to tell us - we could connect to CEP instance in order to get the event stream (some like a diagnostics log; should be the input adapter itself?) and hydrate it to a storage (disk, database etc.). Then, a particular process will remove events from storage based on a corresponding CEP output.
In case of a failure, after relaunching the CEP server, all we need to do is to re-enqueue the events still stored.
Tell me how all these sound to to you.
- Marked As Answer by Marius Zaharia Wednesday, May 05, 2010 7:51 PM
Wednesday, April 28, 2010 5:41 AM
Sorry for the delayed response. First, Noam's post raises good points. Applications that are at high throughputs and cannot afford to miss a single event will require option 3, or take mitigation steps at the adapter level. As Noam rightly points out, the proposals work for situations where the constraints are less strict. The proposals mentioned in the earlier post are by no means a substitute for option 3 - they are workarounds that are pragmatic for many common scenarios (web analytics, meter readings etc). This is good feedback.
Option 4 is really a function of your application and the event rates. If you have the logic on the output side to detect the "corresponding CEP output", then by all means, your suggestion is worth investigating.
I am not able to understand Option 5. Note that StreamInsight provides you with rich diagnostics at the various "portals" of a query - at the input adapter, at the point of enqueue into the server, at the point of dequeue from the server, and at the output adapter. But these convey only the statistics of the events processed - these are not a mechanism for storing the actual events themselves. I think what you have in mind is something like the Option 2 described in my post above.
- Proposed As Answer by bhushanvinay Tuesday, August 17, 2010 9:16 AM
Monday, May 17, 2010 11:24 AM
Those are good solutions (not sure I understand option 5 fully), but they have one issue. They require the output adapetr to be able to match outputs with inputs. This means you are "leaking" your business logic from the query to your adapters. It is almost like you are backward engineering your own query. It means you will need to have a special output query for each SIS query. At this point, I am not sure I see the value of the query any more.
I think we need to let the SIS guys solve this problem (as I know they are).
Tuesday, August 17, 2010 9:17 AM
I was thinking about the same issue what is to do with the failover, and looked up online what people have to say. I may be completely wrong here, but this is just out of my experience with building workflows and e.t.l tools,
Input Stream-> TimeWindowMaker -> all the data is dumed to a FastCache[Any caching engine] . -> InputAdapter reads of the Cache + Query on it (RePersist the Adapter Status including any useful information which is needed in case of failover to Cache) ->output adapters
this will enable a active passive failover system, just by adding a level of indirection.
and once the second box is online it can catch up and process the data from the last left of status of the old InputAdapter.
Time window maker is responsible for chunking of data,
Fast cashe engines will provide a statefullness of the stream, and process.