none
Get Max Hour

    Question

  • I have a historical query that will keep two days worth of transactional data in it as well as a last minute query that will keep the last minutes worth:

    var LastMin = from l in myQsubject.ToPointStreamable(e => PointEvent.CreateInsert<QInfo>(e.EventDTTM, e),
                                                                                AdvanceTimeSettings.StrictlyIncreasingStartTime)
                                      group l by new {l.Q_ID, l.Hour} into Mint
                                      from m in Mint.TumblingWindow(TimeSpan.FromMinutes(1), new DateTime(2013, 1, 1, 0,0,0, DateTimeKind.Utc))
                                      select new QCounts { Q_ID = Mint.Key.Q_ID, Cnt = m.Count(), Hour = Mint.Key.Hour, Avg = 0 };
    
    
    var HistAvg = from q in LastMin.AlterEventDuration(e => TimeSpan.FromDays(2))
                                      group q by new { q.Q_ID, q.Hour } into perQ
                                      from pq in perQ.TumblingWindow(TimeSpan.FromMinutes(1), new DateTime(2013, 1, 1, 0,0,0, DateTimeKind.Utc)) 
                                      select new QCounts { Q_ID = perQ.Key.Q_ID, Cnt = 0, Hour = perQ.Key.Hour, Avg = pq.Avg(e => e.Cnt)};

    I then get the transactions that previously had a count, but do not now and merge this with the last minutes worth to account for all queues:

    var NoCounts = HistAvg.LeftAntiJoin(LastMin, (e1, e2) => e1.Q_ID == e2.Q_ID && e1.Hour == e2.Hour);

    var AllTransactions = LastMin.Union(NoCounts);

    Then I compare all of the current counts to the historical counts and return anything that is averaging less than half what it previously was and the average is over 10:
    var QCounts = from h in HistAvg
                  join l in AllTransactions 
                  on new {h.Q_ID, h.Hour} equals new {l.Q_ID, l.Hour}
                  where h.Avg > 10 && l.Cnt < (h.Avg/2) 
                  select new OutPutThis { Q_ID = l.Q_ID, QName = "", Job_ID = 0, JobName = "", CurCnt = l.Cnt, Avrg = h.Avg };

    This all works terrific, until the hour changes, then I start getting false positives because the last minute no longer has a count for that hour.  What I need is a way to get the hour from the lastmin query and filter the out only those records from the NoCounts query, but I cannot figure out how to do this.  In normal linq land I would just get the max value of hour from lastmin and then use that to filter on, but that doesn't work here.


    • Edited by ktwade Wednesday, April 24, 2013 2:24 PM
    Wednesday, April 24, 2013 2:22 PM

Answers

  • We got this figured out last week. I'm not sure if it is the "correct" or best way to do it, but it works.

    We expanded our event types class of QCounts so that it contains a count property that is just an int.  

    Then, in our event creators class, I created a global list of queues (integers) that I append to if it is a queue that I have not come across yet, thus giving me a history.  I then compare that to what is currently coming in, and if the current list doesn't contain all of the ids that are in the history list, I insert a new QCounts object into the data stream, for each queue that is missing, that has a count value of 0 and the queue id of the missing queue.  

    Then on our LastMin stream, instead of doing an m.Count() for the Cnt value I do an m.Sum(e=> e.Count), which is our flag that we added.  This gives me an accurate total as well as the ability to detect items that are no longer there, which is what we were trying to do: detect a "0" count event.

    This is probably confusing and doesn't make sense to anyone else but me and Andrew, but it totally works.

    • Marked as answer by ktwade Thursday, May 23, 2013 9:20 PM
    Thursday, May 23, 2013 9:20 PM

All replies

  • Hi ktwade,

    Thank you for your question. 

    I am trying to involve someone more familiar with this topic for a further look at this issue. Sometime delay might be expected from the job transferring. Your patience is greatly appreciated. 

    Thank you for your understanding and support.


    Allen Li
    TechNet Community Support

    Friday, April 26, 2013 2:46 AM
    Moderator

  • The following may be useful-

    http://msdn.microsoft.com/en-us/library/ee362394.aspx

    http://msdn.microsoft.com/en-us/library/ff518536.aspx



    If none helps, please open a case with PSS and we will be happy to assist you further.



    Thanks,

    -Sandeep.
    Wednesday, May 08, 2013 2:13 PM
  • Yeah, none of that is helpful at all.

    I just have a query that gets that last minutes worth of data and one of the values it contains is the hour in which the data was gathered.  I just need to be able to get that value out and use it as a filter in another query.  The problem is, I just need one of the results, as they are all the same hour, instead of the same result repeating for each hit in the query. As I mentioned above, this would easily be accomplished in LINQ with one of several methods (First, FirstOrDefault, Take(1), ect.) however, I cannot use those here and wanted to know if there was an equivalent way of producing this.

    Kaleb

    Monday, May 13, 2013 2:14 PM
  • You can use a Take() with StreamInsight. Alternatively, it's relatively simple to do a time-sensitive UDA that returns the first (or last) event from a window.

    All of that said, I'm more inclined to think that we need to look at your query strategy. If you can provide some details on what you are trying to get out of the query (for example, what question are you trying to answer?), we can perhaps work through an alternative. I don't understand, for example, why you are grouping by timestamp rather than using a hopping or tumbling window (which inherently does that).


    DevBiker (aka J Sawyer)
    Microsoft MVP - Sql Server (StreamInsight)


    Ruminations of J.net


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    Monday, May 13, 2013 2:33 PM
    Moderator
  • What we are trying to achieve is to get the "hour" value from the LastMin query so we can use that as a filter from the HistAvg query. So when it's the 10 o'clock our (am) we can look into the HistAvg query and see what we typically average at 10 o'clock each day (over the past x days) and then compare that to what our current count is and fire events if it's below our threshold. We also need to report out the queues that previously had data this hour but no longer do. This would indicate that it has stopped processing data, which is where the AnitJoin came into play. Does this make sense?

    We did not choose to try this one way versus another, we were just trying what made sense to us, but I think we might be coming at it the wrong way. We are completely open to suggestions of doing it a different way, as we are not married to one way or the other :)!

    Kaleb

    Monday, May 13, 2013 6:20 PM
  • It sounds to me like you are approaching it as you would a traditional query, not a temporal query.

    With StreamInsight, queries are temporal in nature; that is, time is a dimension for all operations, not just an attribute (as in traditional queries). This is the most difficult concept to truly grasp, understand and leverage in StreamInsight. It requires a shift in how you think about data.

    Your "HistAvg" query sounds like it's pulling stored/slowly changing "reference data", correct? Have you thought about just using the timestamps for the events and then enqueuing the reference data with the appropriate timestamps for the time period that they reference?

    Would you be able to do a mock-up in LinqPad of your data? If so, and you post it, we can then work together and tweak it.


    DevBiker (aka J Sawyer)
    Microsoft MVP - Sql Server (StreamInsight)


    Ruminations of J.net


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    Monday, May 13, 2013 6:43 PM
    Moderator
  • I agree that I think I don't completely have my head wrapped around the time concept in SI, but I think I'm getting there.

    I guess you could say that HistAvg is a reference stream of sorts, as it is an extension of LastMin that we reference LastMin back to for comparison purposes. I think that we are retrieving most of the data that we want correctly, it's just the "0" values that are giving us a hard time.  Those would be the entries for a specific hour that had a previous value and now no longer does.  So if queue 123 had 14 messages at 10:02, but then at 10:03 it had 0 messages we would want that to be able to fire an event on that, but we have not be able to get that portion quite right. Every thing we have tried so far has caused us to get duplicate hits, because we are trying to narrow down something to filter on, whether it be time or the queue id.

    Here is a shot of what our data looks like in the event flow debugger, this was taken from the initial cleanse that takes place after the stream:

    

    So each hit for the respective queue ID is a transaction, so we group by q_ID to get a total of transactions for that minute. Here queue 18 had 4 transactions for 7:30 and queue 408 had 5 transactions for 7:30.  What we are having a problem with would be, on the next cycle (7:31) if queue 18 had 0 transactions, we want that reported but we can't get this correct.

    Does this make sense? Sorry, I installed LinqPad, but I'm not exactly sure what I'm doing with it or what you wanted me to provide with it, so I tried this way.  If you need me to elaborate further on anything, please let me know.  I'll post our current queries, which give us undesired results, below.

    var LastMin = from l in myQsubject.ToPointStreamable(e => PointEvent.CreateInsert<QInfo>(e.EventDTTM, e), AdvanceTimeSettings.StrictlyIncreasingStartTime) group l by new {l.Q_ID, l.Hour} into Mint from m in Mint.TumblingWindow(TimeSpan.FromMinutes(1), new DateTime(2013, 1, 1, 0,0,0, DateTimeKind.Utc)) select new QCounts { Q_ID = Mint.Key.Q_ID, Cnt = m.Count(), Hour = Mint.Key.Hour, Avg = 0 }; var HistAvg = from q in LastMin.AlterEventDuration(e => TimeSpan.FromDays(7)) group q by new { q.Q_ID, q.Hour } into perQ from pq in perQ.TumblingWindow(TimeSpan.FromMinutes(1), new DateTime(2013, 1, 1, 0,0,0, DateTimeKind.Utc)) select new QCounts { Q_ID = perQ.Key.Q_ID, Cnt = 0, Hour = perQ.Key.Hour, Avg = pq.Avg(e => e.Cnt)}; //Here is where we are trying to get the 0's out var LHourQs = from h in LastMin.AlterEventDuration(e => TimeSpan.FromDays(3)) group h by new { h.Q_ID } into hr from o in hr select new MyQs { Q_ID = hr.Key.Q_ID }; var HistThisHour = from t in HistAvg join h in LHourQs on t.Q_ID equals h.Q_ID select t; var AntiMin = HistThisHour.LeftAntiJoin(LastMin, (e1, e2) => e1.Q_ID == e2.Q_ID); var AllTransactions = LastMin.Union(AntiMin);

    //this is where we stop that and the query below is correct var QCounts = from h in HistAvg join l in AllTransactions on new {h.Q_ID, h.Hour} equals new {l.Q_ID, l.Hour} where h.Avg > 10 && l.Cnt < (h.Avg / 2) select new OutPutThis { Q_ID = l.Q_ID, QName = "", Job_ID = 0, JobName = "", CurCnt = l.Cnt, Avrg = h.Avg };


    Tuesday, May 14, 2013 12:44 PM
  • We got this figured out last week. I'm not sure if it is the "correct" or best way to do it, but it works.

    We expanded our event types class of QCounts so that it contains a count property that is just an int.  

    Then, in our event creators class, I created a global list of queues (integers) that I append to if it is a queue that I have not come across yet, thus giving me a history.  I then compare that to what is currently coming in, and if the current list doesn't contain all of the ids that are in the history list, I insert a new QCounts object into the data stream, for each queue that is missing, that has a count value of 0 and the queue id of the missing queue.  

    Then on our LastMin stream, instead of doing an m.Count() for the Cnt value I do an m.Sum(e=> e.Count), which is our flag that we added.  This gives me an accurate total as well as the ability to detect items that are no longer there, which is what we were trying to do: detect a "0" count event.

    This is probably confusing and doesn't make sense to anyone else but me and Andrew, but it totally works.

    • Marked as answer by ktwade Thursday, May 23, 2013 9:20 PM
    Thursday, May 23, 2013 9:20 PM