none
Performance Benchmarks

    Question

  • Are there any performance benchmarks for StreamInsight ? Is there any list of limitations?
    Thursday, April 01, 2010 11:08 PM

Answers

  • Hi Don,

    can you help us understand what kind of design limitations do you have in mind? Do you have an example?

    If you're thinking limitations in terms of HW resources: we're the same as any managed app: we can use up to 64 CPUs and as much memory as you have on your box. In V1 we have no functionality to handle priorities or limits on CPU/memory resources.


    MS StreamInsight Team Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights.
    Monday, May 17, 2010 11:16 PM

All replies

  • Hi Don,

    we have a set of internal benchmarks that we use to tune and drive performance improvements for StreamInsight. Some of those we built based on how the product was engineered. Others, we modeled very closely after real scenarios of customers we worked with.

    What types of limitations do you have in mind?

     


    MS StreamInsight Team Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights.
    Tuesday, April 06, 2010 5:24 PM
  • Can you publish the performance benchmarks that you have?

    Can you publish any design limitations that streamInsight has?

    Tuesday, April 06, 2010 8:46 PM
  • Hi Don,

    most of those benchmarks are real customer workloads which we can't disclose to protect our customers' IP. With our new customer deployments coming in near future we should have more data to share with you.

    Actually, there is public release regarding one of the scenarios here: http://www.microsoft.com/presspass/exec/veghte/2009/05-11TechEd.mspx. I'll copy/paste the relevant part to StreamInsight:

    "I'll use an example we've got a Microsoft. Within the MSN adCenter – within the MSN adCenter we have the opportunity to process about a half a billion events, Web events a day. Half a billion events a day. It's a constant stream. You either consume it or you don't.

    And our ability to consume that and pattern match in terms of how people are searching, what ad inventory is optimized, what people are interested in, and is the difference between – is literally – literally a double-digit swing in user interaction and a double-digit swing as a result in ad revenue.

    And when you think about that huge half a billion unit set of data, the faster and more agile I can react to it, the better off we are.
    "

    Regarding design: StreamInsight was designed to be able to process high data-rates of events with very low latency and rich query semantics.


    MS StreamInsight Team Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, April 07, 2010 5:10 PM
  • Don,

    let us know if there is any futher information we can provide for you?

    Anton


    MS StreamInsight Team Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, April 21, 2010 6:51 PM
  • How about providing some Performance Benchmarks for StreamInsight on some example hardware?

    • How much Input bandwidth can it consume?
    • How much output bandwidth can it generate?
    • How fast is it at performacing specific Input calculations?
    • What are StreamInsights design limitations?
    • Any other metrics that be used a design criteria for using or implemeting StreamInsight.
    Wednesday, April 21, 2010 7:34 PM
  • Don,
    These are great questions. However, the answers to those depend to a big degree on the data and query complexity. There are a few dimensions which have significant impact on the throughput and latency of the system. For example, changes in one of the following dimensions can cause substantial differences in the results:

    • Degree of disorder in the data (if any)
    • Event overlap
    • Frequency of CTIs
    • Windows: type of window, size of window, hop-offsets, number of events in each window
    • Query complexity: number of operators, complexity of operators (user defined operators are more expensive than other window operators which are more expensive than simple operators like filter/project)

    Because there are so many factors which can influence performance,  our current customers are doing their own capacity planning and investigations based on their particular scenarios.

    We will provide numbers (hopefully in near future) once we have customer show-case implementations on top of StreamInsight for which we can share the data. We already have one such case in the example of MSN adCenter that I shared above.

    We do consider participating in public benchmarking efforts. Do you have any particular benchmark in mind for which you think it will be helpful if we publish results?

    Anton


    MS StreamInsight Team Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, April 21, 2010 9:46 PM
  • Some simple performance benchmarks, anything that will give us an order of magnitude estimation would be helpful.

    For the effects that you mentioned that would effect the results, you could provide a plot that would show performance versus amount of some dimension.

    We are not looking for perfect answers and we always know that the ultimate answer is "that it depends....", but some performance numbers from you would be a big help.

    You say that StreamInsight is fast and can handle many inputs rapidly, the question is can you prove your statements quantitatively?

     

     

    Thursday, April 22, 2010 3:34 AM
  • Hi Don,

    If you believe that some "ball park" numbers would be good and valuable reference points we will work on providing such.

    I'd like to make sure that the scenarios we pick are indeed useful. Do you have any scenarios in mind which you think would be good reference points? Also, do you see some of the dimensions I listed above as more important than others.

    Anton


    MS StreamInsight Team Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, May 05, 2010 11:02 PM
  • The goal of the benchmarking should be that before someone puts a system together to be able to calculate a rough estimate of the performance of a target system.

    You may need to try benchmarks on some "typical" platform that is well known and can be easily replicated and extrapolated.

    It would be important to know:

    • any design or practical limitations
    • the effect of Input Bandwidth versus CPU usage.
    • Output Bandwidth versus CPU usage.
    • Data processing functions x function execution rate versus CPU usage

    Then:

    • how do the dimensions that were mentioned effect the basic benchmarks?
    • also what is the effect of using multiple cores?
    • is there any significant performance improvement in using embedded OS without some specific OS functions?

    Basically, If the experts were to "size" or specifiy a target system for a customer, what procedures and calculations would they perform to do that properly?

    Wednesday, May 05, 2010 11:47 PM
  • Thanks a lot for the feedback.

    We'll work on getting reference performance numbers and we'll publish them here and on our blog.

    We're also having case studies coming in near future talking about real implementations using StreamInsight and they'll have a detailed description of the solution as well as the performance we're getting out of SI for it. Should be helpful!

    Anton


    MS StreamInsight Team Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights.
    Friday, May 07, 2010 12:15 AM
  • What about a list of design limitations?

    Are there any limitations of any sort?

    Friday, May 14, 2010 7:10 PM
  • Hi Don,

    can you help us understand what kind of design limitations do you have in mind? Do you have an example?

    If you're thinking limitations in terms of HW resources: we're the same as any managed app: we can use up to 64 CPUs and as much memory as you have on your box. In V1 we have no functionality to handle priorities or limits on CPU/memory resources.


    MS StreamInsight Team Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights.
    Monday, May 17, 2010 11:16 PM