locked
Data storing functionality in StreamInsight RRS feed

  • Question

  • Hi , I am new to Streaminsight technology, and want to understand the basic architecture of streamInsight. I could not find a good document in internet for this. However , from the product team example I understand that StreamInsight operates on streams of events in memory. I understand the query execution on the data happens in memory to make it real time. But my question is , if there is stream of data coming to streamInsight engine how it actually stores historic data to show the insight. Does it store the whole stream of data in memory? Also how it actually clears the memory . 
    It will be helpful if any one can help me to find out these answers.  Also Any good documentation for this?                                      
    Monday, November 17, 2014 8:58 AM

Answers

  • First, don't think of StreamInsight like you think of a database. It's not. And, likewise, don't think of a StreamInsight query as anything like a database query. It's not. Queries are not request/response like in an RDBMS. Instead, queries are continually evaluated and answers are continually produced as long as there is data in the stream; the answer is pushed, not pulled.

    This lends itself to answer different kinds of questions than you would typically look for from a RDBMS. Using the toll example, I would say that the query that you reference isn't really an appropriate one for StreamInsight but, instead, would be queried from a traditional RDBMS. StreamInsight may feed that RDBMS ... for example, if you wanted to have the total tolls collected every minute or every 5 minutes at each booth and you wanted that continuously updated, that would be StreamInsight ... and it could then store it in an RDBMS to support the query that you mention. More interesting, however, is to use StreamInsight to determine average speeds between toll booths, if a toll booth reader is down/out of commission or even if someone managed to skip/evade a toll booth. These are queries that require an understanding of events as they happen in time and the ability to correlate different events based on that timeline. Those things are very difficult to do with an RDBMS but relatively easy to do in StreamInsight.


    DevBiker (aka J Sawyer)
    Microsoft MVP - Sql Server (StreamInsight)


    Ruminations of J.net


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    Thursday, November 27, 2014 4:31 AM
    Moderator

All replies

  • Probably the best resource for this is a couple of entries on Mark Simms' blog, starting with this one: http://blogs.msdn.com/b/appfabriccat/archive/2010/10/19/streaminsight-and-reference-data-lists-databases-etc.aspx.

    Essentially, you'll have the durable store as an input source and enqueue the stored data as temporal events. It usually works best to enqueue them as intervals. StreamInsight doesn't store the historical data.

    Now, for events in the stream, they are cleaned out of memory when they expire. Expiration is based on the event end time. Keep in mind that, inside the engine, you don't have different event shapes ... everything has a start and end time, even if that end time is simply 1 tick past the start time.


    DevBiker (aka J Sawyer)
    Microsoft MVP - Sql Server (StreamInsight)


    Ruminations of J.net


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    Sunday, November 23, 2014 8:18 PM
    Moderator
  • Thanks for your response!

    Let's take example of Toll booth, where in every min one car is comming, now my query is to calculate the toll collected from the Toll booth for the whole day. now at the end of the day suppose the Toll inspector want to see that report. then one request will come to streaminsight server, and query will run, will it sum all the data together or it will do the incremental evaluation and each time it stores the query output in the memory? please explain. 

    Wednesday, November 26, 2014 5:38 AM
  • First, don't think of StreamInsight like you think of a database. It's not. And, likewise, don't think of a StreamInsight query as anything like a database query. It's not. Queries are not request/response like in an RDBMS. Instead, queries are continually evaluated and answers are continually produced as long as there is data in the stream; the answer is pushed, not pulled.

    This lends itself to answer different kinds of questions than you would typically look for from a RDBMS. Using the toll example, I would say that the query that you reference isn't really an appropriate one for StreamInsight but, instead, would be queried from a traditional RDBMS. StreamInsight may feed that RDBMS ... for example, if you wanted to have the total tolls collected every minute or every 5 minutes at each booth and you wanted that continuously updated, that would be StreamInsight ... and it could then store it in an RDBMS to support the query that you mention. More interesting, however, is to use StreamInsight to determine average speeds between toll booths, if a toll booth reader is down/out of commission or even if someone managed to skip/evade a toll booth. These are queries that require an understanding of events as they happen in time and the ability to correlate different events based on that timeline. Those things are very difficult to do with an RDBMS but relatively easy to do in StreamInsight.


    DevBiker (aka J Sawyer)
    Microsoft MVP - Sql Server (StreamInsight)


    Ruminations of J.net


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    Thursday, November 27, 2014 4:31 AM
    Moderator