Get a single result out of a stream
-
Friday, March 08, 2013 1:20 PM
Hello,
I have several adapters constructed and now will develop some qeries.
In one of my samples I have two streams: weather data, which incomes periodically and another stream where I am just counting the rising events.
Next I create a window over my counts (dont now which one yet).
Now I want to combine the count of the second stream with the current weather data (at start or endtime of the window, doesnt matter) of the first stream.That means that I just want to get a single record out of the first stream, while all other events will not be used.
Well, I could imagine that I simply can use a standard sql for this instead of misuse the streaming technology.
But as I described at the beginning, I have several Adapters created and I try to get along with those I have ( its I think a simple thing, so a little misuse is not that bad).I tried making a ver small but slow window, but on windows only groupfunction are available and attributes like the temperature are not groupable.
Are there any ways? Thannks.
- Edited by Kaspatoo Friday, March 08, 2013 1:22 PM
All Replies
-
Tuesday, March 12, 2013 3:11 PM
I'm not sure that I understand fully what you are trying to do.
First, you are correct, you can only apply groupings to windows. Now, that said, you don't have to group your windows - you could, for example, create a window that just does a count of all events within that window. Would that be similar to what you are looking for? Also, note that if you don't have any matching events in the window, you won't get any output events.
DevBiker (aka J Sawyer)
Microsoft MVP - Sql Server (StreamInsight)
If I answered your question, please mark as answer.
If my post was helpful, please mark as helpful. -
Tuesday, March 12, 2013 3:26 PM
hi
i want to match the inoutstreams by time.
I am still doing a Count on the one Stream (e.g. Window size 1minute)
The other Stresm shall deliver the current weather data every minute the count-window is started or ended, so i need just a single result out of this second stream.
As result I want to have weather data + the count of other events. I want one resultset every minute.
-
Tuesday, March 12, 2013 8:31 PM
-
Wednesday, March 13, 2013 9:07 AM
countng-stream:
var expr_twitter_count = from e in input_twitter.TumblingWindow(TimeSpan.FromSeconds(5)) select new {Count = e.Count()};weather-stream:
var expr_wetter_hopping_starselect = from e in input_wetter.HoppingWindow<Wetterstatus>(TimeSpan.FromMilliseconds(1), TimeSpan.FromMinutes(1)) select e;unfunctional-join:
var query_wetter = from e1 in expr_wetter_hopping_starselect join e2 in expr_twitter_count on e1.CreatedAt <= e2.CreatedAt WHERE e2.ROWCOUNT = 0;(join by time and ensure only one resultset of the weather stream per every counting-window)
usually joining streams will join all elements of stream a with all elements of stream b where some conditions are true
i want join all elements of stream a with just the latest event of strem b (where latest is the condition)
is that clear now?
-
Wednesday, March 13, 2013 8:06 PM
OK ... First, all of your joins are in time. It's an extra "hidden" dimension in every query. I think that you aren't quite grokking that and that's where you are going wrong. Your statement "usually joining streams will join all elements of stream a with all elements of stream b where some conditions are true" isn't quite accurate. Joining streams will going elements of StreamA with elements of StreamB where some matching conditions are true and the events' temporal properties overlap. Their start and end times must overlap. For points, the start times have to be identical. And the resulting event's temporal properties will be the overlap of the two joined events.
Your twitter count stream will give you only a single output event every five seconds, assuming that you have any events in those 5 seconds. No events will give you no result, not even a 0. I'm not sure what you are trying to accomplish in the HoppingWindow for expr_wetter_hopper_starselect. To get the "latest" event in any stream, you'll want to use the ToSignal() pattern ... Alter/Clip. The condition for your clip, to get the very last event of any kind, would simply return true. If your AlterEventDuration specifies TimeSpan.MaxValue, you'll always have the last value received, regardless of when it came in.
Does that make sense?
DevBiker (aka J Sawyer)
Microsoft MVP - Sql Server (StreamInsight)
If I answered your question, please mark as answer.
If my post was helpful, please mark as helpful. -
Friday, March 15, 2013 3:05 PM
hey
Thx for your response.
You probably pointed it out, but I am not sure.Well due to clipping the point events they were logically transformed to (kind of) a intervall event, arnt they?
Same at creating a counting window so I have to intervall streams I want to join and they were automatically joined by time (if no further conditions are given).Will the join create some carthesian product? Every overlapping "intervall" events of the streams will be joined each other?
So the following visual example (aigns visualize length of the clipped/ aggregated "intervall" events)...
_____________________________________________
##e1_1#### ##e_1_2###### ###e1_3#####
------------------------------------------------------------------------
******e2_1************ *********e2_2********
_____________________________________________... will result in the following output (joined) events:
e3_1(e1_1 + e2_1)
e3_2(e1_2 + e2_1)
e3_3(e1_2 + e2_2)
e3_4(e1_3 + e2_2)
Am I right?And if I just want all e2-events (so only two output events) I need to use a left-join? If yes, how I can specify which corresponding e1-events to add?
In this sample to just get the following outputs:e3_1(e1_2 + e2_1) //e1_2 because it matches the endpoint of e2, if startpoint were used it had been e1_1
e3_2(e1_3 + e2_1) //e1_3 because it matches the endpoint of e2, if startpoint were used it had been e1_2
Thanks for further help.- Edited by Kaspatoo Friday, March 15, 2013 3:07 PM
-
Monday, March 18, 2013 10:04 AM
Well, just continuing my last visualization and not considering to-Signal-Pattern, yet I now have implemented a little join:
//produces a count every 5 seconds when ran separately var expr_twitter_count = from e in input_twitter.TumblingWindow(TimeSpan.FromSeconds(5)) select new {Count = e.Count()}; //produces weather data every minute (due to a sleep within the input adapter) when ran separetely var expr_wetter_starselect2 = from e in input_wetter select e; //prodduces just some CTI events (very irregular) var expr_wetter_twitter = from e1 in expr_wetter_starselect2 from e2 in expr_twitter_count select new Wetterstatus { ID = e1.ID, CreatedAt = e1.CreatedAt, City = "", Lon = e1.Lon, Lat = e1.Lat, Temperature = e1.Temperature, Pressure = e1.Pressure, Humidity = e1.Humidity, Wind = e1.Wind, Rain = e1.Rain, Clouds = e1.Clouds, Weather = e1.Weather, TweetCount = e2.Count };Cant get into the join to debug.... any ideas whats my fault is?
Thx much
-
Monday, March 18, 2013 4:57 PM
Very good idea where the problem is - you aren't taking into account that joins happen in time. For events to join, they have to have overlapping start/end times. All queries in StreamInsight happen in time and that's very different from traditional RDBMS query logic. It's also the thing that you will need to get your head around ... and it'll take some time to fully "get" the paradigm shift.
Have you looked at the events in the event flow debugger? That will show you the temporal headers and I'd put money that they don't overlap. The "irregular" CTIs is also expected ... the join won't occur until both streams "move past" a particular point in time, as defined by the stream's CTIs. When both streams move past a point in time, you'll see the join happen and the CTIs get produced.
Finally ... you said that you have a "sleep" in your input adapter. Are you putting the I/A's thread to sleep using Thread.Sleep? Or something else?
DevBiker (aka J Sawyer)
Microsoft MVP - Sql Server (StreamInsight)
If I answered your question, please mark as answer.
If my post was helpful, please mark as helpful. -
Tuesday, March 19, 2013 8:38 AM
Hi,
youre not directly talking about my visualization-post. Am I just right there?
Well, I also thought of missing overlapping in Time but also could not imagine that.
Because I have a tumbling window I think that the counting stream is filling the whole length of real time, there shouldnt be any gap in time, because its a tumbling window without waiting time but just length. If the one window will stop the next is immediately started, isnt it???
If Im not wrong here every weather event can be joined in time to any counting window.The sleep is directly within the weathers input adapters loop where it is firing the queries.
So the Twitters Inputadapters loop and any oter Component is still running.
Else the query is fired like every millisecond while the data changes like once an hour.
But I didnt want to increase the complexity of the streams until my problem is not solved yet (so its no final implementation).
Is there any problem?I will have a look at the flow point debugger, didnt get this in focus so far.
Thanks again.
- Edited by Kaspatoo Tuesday, March 19, 2013 8:40 AM
-
Tuesday, March 19, 2013 2:42 PMFor a window, the default behavior is PointAlignToWindowEnd ... so you have a Point event with a start time at the end of the window and a length of 1 tick. If you want it to behave otherwise, you can use ClipToWindowEnd ... this will give you an interval event that spans the window. Or ... you can create the stream and then AlterEventLifetime - IIRC, this is what you need to use if you are using the 2.1 model as opposed to <= 2.0 model. So ... your assumption is incorrect (and the Event Flow Debugger will show this).
DevBiker (aka J Sawyer)
Microsoft MVP - Sql Server (StreamInsight)
If I answered your question, please mark as answer.
If my post was helpful, please mark as helpful. -
Tuesday, March 19, 2013 2:54 PM
Huh? I'm not sure what you are referring to here.Hi,
youre not directly talking about my visualization-post. Am I just right there?
DevBiker (aka J Sawyer)
Microsoft MVP - Sql Server (StreamInsight)
If I answered your question, please mark as answer.
If my post was helpful, please mark as helpful.- Edited by DevBikerMVP Tuesday, March 19, 2013 2:54 PM
-
Tuesday, March 19, 2013 4:00 PM
I am referring to my post where I visualized two expample queries, I also asked there if my statements and thought were correct. In your replying post you did not claim about anything out of this visualization-post. So I am assuming theres no fault, is there?
Wont ClipToWindowEnd just clip the events endtime to the windows endtime? Id prefer to also clip the starttime of the event to the starttime of the window.
I guess theres a function like ClipToWindow which means both start and enddtime.Right, Ill first try to get into the debugger and come later back to this thread/ my question.
Thanks for that. -
Tuesday, March 19, 2013 5:12 PM
Missed that reply. Sorry.
First, although it's not entirely accurate, it's helpful to think of all events in a stream as intervals. They will all have start and end times. The event shapes that you work with are merely ways of expressing the events via your adapters/sinks/sources.
And yes, your visualization of the join is on target. The trick now is to get your events to align in time. Your default behavior for a hopping/tumbling window is PointAlignToWindowEnd, not ClipToWindowEnd. So you get events with a 1 tick lifetime.
You mentioned that you wanted only 2 events as the result for your visualization but, as you've correctly assumed, you'd get 4. So ... tell me ... how do you expect the events, when joined, to appear in your result? With event e2_1, should it join with event e1_1 (the start) or e1_2 (the end)? The answer will determine how you need to alter the temporal properties of the event ...
DevBiker (aka J Sawyer)
Microsoft MVP - Sql Server (StreamInsight)
If I answered your question, please mark as answer.
If my post was helpful, please mark as helpful. -
Wednesday, March 20, 2013 9:39 AM
Doesnt matter if joining by start or endtime but think starttime would be better for later use.
Do you know any good (graphic) presentation of those window properties? MSDN is not talking that much, only a small sentence.
-
Thursday, March 21, 2013 2:11 PM
First, you join by the entire event duration, not just one or the other. The resulting event has start and end times from the overlap.
You can see a PPT with this at https://skydrive.live.com/redir?resid=E45DFECBE9DCC432!807&authkey=!AFbyILkjoU_1H00; look at slide #28.
DevBiker (aka J Sawyer)
Microsoft MVP - Sql Server (StreamInsight)
If I answered your question, please mark as answer.
If my post was helpful, please mark as helpful. -
Friday, April 05, 2013 8:46 AMhm slide #18 to #31 arent available (application jumps directly to #32
-
Thursday, April 11, 2013 11:57 AM
Hey,
I got this query:
var expr_twitter_count = from e in input_twitter.TumblingWindow(TimeSpan.FromSeconds(5), WindowInputPolicy.ClipToWindow, HoppingWindowOutputPolicy.ClipToWindowEnd) select new {Count = e.Count()};So as you said this will generate a intervall event but as the eventflodebugger sais, there are only events with length of one tick.
Isnt using clip to window end time like cutting the even to the window end time? but when having point events, clipping wont have any effect. ?
for my other stream I tried using the toSignal Pattern, well every events lifetime is altered to endless but clipping it to the start time of the next event has no effect, in feventflowdebugger all events stay at endless
var expr_wetter_starselect3 = ToSignal<Wetterstatus>(input_wetter);
-
Thursday, April 11, 2013 2:19 PMDownload the whole deck. I didn't realize that it would launch in the Powerpoint viewer ... you should be able to "Save As" and see the hidden slides.
DevBiker (aka J Sawyer)
Microsoft MVP - Sql Server (StreamInsight)
If I answered your question, please mark as answer.
If my post was helpful, please mark as helpful. -
Thursday, April 11, 2013 2:28 PM
OK ... first, think of the event shapes as only expressions of the events that are used by input and output adapters. Once inside the engine itself, the shapes are pretty meaningless. You seem to be trapped in thinking that the shape is an inherent property in the engine; it's really not. Think of shapes only in how you want to either enqueue them into the engine or see them in the output. Any query, of any set of events, can be expressed in the output as any shape.
Second, alter/clip does provide some confusing results in the debugger. The "Retract" event is the operation of the engine "Retracting" an event out of the stream. So while you see the insert with a +Infinity end time, that is only applicable until the corresponding "Retract"
And your query will generate output events - with ClipToWindowEnd - will create events with a 5 second duration. However, that's not necessarily how they will be expressed to the output adapter. Keep in mind, too, that any joins you make will affect the lifetime of the event - to see exactly what's produced by the tumbling window, you'd need to send the output to a sink/output adapter without joining to anything. And ... when I want to test and see exactly what the events look like, I'll use either an interval or an edge for the output adapter/sink. Make sure that you examine the event header, not just the payload ... even if you'll wind up using a Point output adapter/sink at the end of the day.
DevBiker (aka J Sawyer)
Microsoft MVP - Sql Server (StreamInsight)
If I answered your question, please mark as answer.
If my post was helpful, please mark as helpful. -
Thursday, April 11, 2013 3:48 PM
Do I understand you right:
Clipping Event duration will not alter it but create an retract event which stops the events duration and the retract event is popped up when its time to clip?
Ok, my Counting Window has a duration of 5 seconds due to my tumbling window of 5 seconds.
I now deleted my manual sleep within my weather input adapter (which forces my sql to wait 60 seconds before asking for next weather state).
Due to this I now get every 1ms (or earlier) a weather input event.
At first what I would like to have is
- just one weather input event
- every 10 secondsBut I dont know how to achieve this, do you?
After getting this working, I think my weather input event will be joined in the ouput with one to two (maybe three?) count events (because of correlation 5 to 10 seconds). That means that my count-events will get redundant for every weather-event which has time-overlapping. My second target is now to eliminate all of these joins until my count-events arent redundant any more. Wether the count-events will then just join the first occurence of a weather-event within the weather's duration or the last doesnt matter.
Do have any suggestions which windows, windows preferences or linq statement to use?
Thanks
-
Friday, April 12, 2013 3:54 PM
Yes, clipping creates a retract. It's confusing, I know. But ... you also have the ability to "see" the events through the query operators that make up an output event ... right-click on the output event and select "Root Cause Analysis". Or you can go the other way by selecting an input event and using propogation analysis.
Since I'm not sure how your inbound data stream looks or how you are trying to get it from SQL - or why - I can't tell you how to get 1 input every 10 seconds. There just isn't enough background for me to even begin to make suggestions on how you should do it.
Ideally, you don't get your source data from SQL. There's all kinds of latency associated with that and the polling puts an extra load on the database. Where's the source data coming from? How does it get to SQL in the first place? Can you tap that source stream? And why do you need an source event every 10 seconds?
Honestly, it sounds like you are thinking in a "batch" processing mode rather than a stream processing mode.
DevBiker (aka J Sawyer)
Microsoft MVP - Sql Server (StreamInsight)
If I answered your question, please mark as answer.
If my post was helpful, please mark as helpful. -
Monday, April 15, 2013 9:28 AM
Hi,
well its not confusing any more if I know that is doing so.
Hmm maybe thats my fault now, but did I tell about an SQL of my weather data? Its a "responseStream" from
http://api.openweathermap.org/data/2.1/find/city?*****
Which is converted from JSON. The problem is, its not really a stream, I have to re-create the "response stream" every loop else and EOF occurs.At the moment theres a 10 second sleep within the code. The problem stays the same, the weather event overlaps two of the countevents so every count event occurs twice. But I just want a List of all count events every joined with just one weather event.
Why TwitterInput is a really stream which also is converted from json.
I want to analyse the amount of tweets for some specific twitter-filters in dependency to the current weather. because weather is a really slow changing thing, I dont need the weather again and again every millisecond, so kust for every twitter-count window.
Is this enough information? Hopefully.
Thanks much.
-
Tuesday, April 16, 2013 2:40 PM
OK ... so, you'll be polling the weather service every X period, right?
Your input adapter/source will do the polling. When it gets the results, the results will then be enqueued to StreamInsight as single events. The one question will be what to use for the start time when enqueuing the event ... do you use the timestamp from the weather service or when you retrieved the result? If the former, you'll need to do some extra checking to make sure that there aren't any CTI violations when you enqueue the event. Since you are joining with a much livelier stream, it's probably better use the time when you retrieved the results, rather than the event timestamp, so you won't have anything dropped and the results feed is still lively (low latency).
I wouldn't really use a sleep ... instead, I'd use a timer. When the timer kicks off, go get the results. Enqueue. Exit timer function. Repeat. I find that pattern simpler though, functionally, it's really not different from a sleep. Thread.Sleep just makes me feel all icky inside.
Like you said, weather is a slow-changing thing ... I wouldn't tie your weather polling updates to the twitter feed. You'll wind up make WAY too many calls to the weather service than you need. Instead, I'd do a sleep/refresh.
DevBiker (aka J Sawyer)
Microsoft MVP - Sql Server (StreamInsight)
If I answered your question, please mark as answer.
If my post was helpful, please mark as helpful.


