locked
Late arriving data and windowing functions RRS feed

  • Question

  • I’m using Stream Analytics with Blobs storage as an input. My data contains a Timestamp column (which I use in my query using the TIMESTAMP BY syntax), and I use  windowing functions to aggregate the data. I’m trying to figure out what happens with late arriving data in this scenario – i.e., assuming I have a time bucket which was already computed, and outputted (to SQL, in my case), and later on a blob is added, which contains additional data for this time bucket. Would the new data be appended to the output, or is it “skipped”? When I simulated this, by uploading blobs with data of previously computed time buckets, these weren’t reflected in the output.

    Monday, November 24, 2014 8:06 PM

Answers

  • The output for a given window from the streaming job is final, meaning late arriving events will not affect the output already generated.

    However, in order to account for late arriving events as you described, we allow user to specify an out of order policy from the job's configure tab. The configure tab has a tolerance window, which is the in-memory buffer we use to hold events and sort them before processing them. It also has a policy, with which the user can choose to either drop the event or adjust to the current time high water mark, when an out of event arrives beyond that tolerance window.

    The default setting for the tolerance window is 0, meaning we expect the events to be ordered by timestamp already. If your use case naturally has events arriving out of order, for example because of clock skew or network delay, you can specify a larger tolerance window to account for the disorderness. Just note, because we hold the events in memory, wait for events with larger timestamps to arrive, and process them in the right order, the end to end latency is increased by the size of the tolerance window.

    Depending on how much latency is acceptable, and how much disorderness there is from the event senders, the user can choose the appropriate tolerance window size and drop/adjust policy to satisfy the needs.

    Monday, November 24, 2014 9:30 PM