none
Streaminsight - Query around set-up and hardware (beginner)

    Question

  • Hi.

    Looking into using StreamInsight on a project at work and I have read a lot of articles but I've not found any details around the hardware requirements for this. I have seen the minimium set-up documents, but sort of after real life examples.

    For example if you were trying to process events over an hour with between 10,000 and 20,000 events. Keeping those in memory to process - what does this take? What if it was over a day.

    Is there any considerations for when the server fails? Can you recover any of the events (expect for re-processing) or mirroring the server?

    Sorry if these aren't technically correct, I just getting my head around the concepts.

    Thanks in advance

    Ian

    Friday, September 20, 2013 3:03 PM

Answers

  • Actually, I don't have much to add to what PowerTX125 said .. he's absolutely correct. But 10K-20K events/hour is something that's pretty easy for StreamInsight to handle. How much CPU you need depends on the complexity of the analytics and how many output sinks you have. How much memory you need depends on some of the same as well as your payload size. That said, StreamInsight is pretty efficient when it comes to memory usage.

    As for raw throughput ... a little update to TXPower's numbers ... I've gotten 150K events/second throughput on a quad core i7 laptop with a passthrough query writing to a local Hadoop instance. There were some basic analytics on this to calculate the events/second and write them to the console but that was minimal. CPU utilization hovered around 75-80% an there was no significant output queue - no more than 1-2 second's worth of events at any one time.

    In a real world scenario, with some complex analytics, we've seen over 100K events/second on a dual quad core XEON with 32 GB of RAM. And that was an average of about 35% CPU.


    DevBiker (aka J Sawyer)
    Microsoft MVP - Sql Server (StreamInsight)


    Ruminations of J.net


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    Sunday, September 22, 2013 1:38 PM
    Moderator
  • I do my StreamInsight development on my laptop (Intel Core i7 M620 @ 2.67mhz, 64-bit Windows 7, 8gb of RAM). We've been able to handle way over 20000 events an hour in a replay/pass-through scenario. DevBiker can speak to that better. Your memory consumption depends on your query logic. If you have a bunch of events that will hang around all day long, it can raise the amount of memory used. Typically you only want to keep your events around only as long as you need them.

    StreamInsight provides some resiliency through checkpointing. You'll need the premium addition to use it. More information is available here: StreamInsight Resiliency. There is no clustering or mirroring mechanisms, but there is nothing stopping you from running another instance on another server other than licensing.

    Saturday, September 21, 2013 3:25 PM

All replies

  • I do my StreamInsight development on my laptop (Intel Core i7 M620 @ 2.67mhz, 64-bit Windows 7, 8gb of RAM). We've been able to handle way over 20000 events an hour in a replay/pass-through scenario. DevBiker can speak to that better. Your memory consumption depends on your query logic. If you have a bunch of events that will hang around all day long, it can raise the amount of memory used. Typically you only want to keep your events around only as long as you need them.

    StreamInsight provides some resiliency through checkpointing. You'll need the premium addition to use it. More information is available here: StreamInsight Resiliency. There is no clustering or mirroring mechanisms, but there is nothing stopping you from running another instance on another server other than licensing.

    Saturday, September 21, 2013 3:25 PM
  • Actually, I don't have much to add to what PowerTX125 said .. he's absolutely correct. But 10K-20K events/hour is something that's pretty easy for StreamInsight to handle. How much CPU you need depends on the complexity of the analytics and how many output sinks you have. How much memory you need depends on some of the same as well as your payload size. That said, StreamInsight is pretty efficient when it comes to memory usage.

    As for raw throughput ... a little update to TXPower's numbers ... I've gotten 150K events/second throughput on a quad core i7 laptop with a passthrough query writing to a local Hadoop instance. There were some basic analytics on this to calculate the events/second and write them to the console but that was minimal. CPU utilization hovered around 75-80% an there was no significant output queue - no more than 1-2 second's worth of events at any one time.

    In a real world scenario, with some complex analytics, we've seen over 100K events/second on a dual quad core XEON with 32 GB of RAM. And that was an average of about 35% CPU.


    DevBiker (aka J Sawyer)
    Microsoft MVP - Sql Server (StreamInsight)


    Ruminations of J.net


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    Sunday, September 22, 2013 1:38 PM
    Moderator
  • Thanks for the replies. That's good to know. I'll attempted some testing next week.

    Ian

    Friday, September 27, 2013 8:50 AM
  • Some things to look for in your testing, btw:

    Output Queue. If this starts growing, then you'll want to take a look at what your output adapter/sink is doing.

    CPU utilization and memory usage, of course. CPU is your biggest performance constraint, followed by memory. But odds are extremely good that you'll saturate your network before that limits you.

    Input queue. This you need to keep a really good eye on. If you get about 200K events in the input queue, StreamInsight will stop taking input. With the adapter model, you'd get notified of this. With the RX/Source model, you don't.

    The average event latency ... this can be hard to measure though. The Diagnostic Views have an overall/total number. You can divide this by the number of events to get an idea. But some of that will also depend on how you are handling CTIs.

    Finally, I'll add a 1 second tumbling window over my primary event stream that just does count() - this will give you events/sec. Then I'll add another 1 minute tumbling window over the count window with just an average() - average events/sec.


    DevBiker (aka J Sawyer)
    Microsoft MVP - Sql Server (StreamInsight)


    Ruminations of J.net


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.


    Friday, September 27, 2013 8:49 PM
    Moderator