locked
Custom job start time with blob storage as input RRS feed

  • Question

  • Hi there,

    In our Azure blob storage we have new blobs inserted constantly PLUS a bunch of old dated blobs going back as far as 2014. To start processing blobs since the beginning of time (fyi our blobs are partitioned with {date}/{time} scheme), we set the custom job start to say 2014 January. The idea is for ASA to start processing from 2014 and once it catches up it'll just process the new blobs.

    Examining the output, I notice that it's doesn't include data from 2014. Upon reading the description more carefully I realize the custom time may not be what I think it is. Could someone help me clarify what exactly setting a custom time with blob input does and what would be the recommended approach to start processing old blobs and then the streamed blobs that come in real time with stream analytics?

    Thanks,

    Richard



    Friday, January 15, 2016 9:27 PM

Answers

  • Is your query using a "timestamp by" statement to specify a column to be used as timestamp of the event? If not, then last modified time of the blob would be taken as the event time.

    Using time partitioned paths and custom time is the right way to process older blobs, in addition to that please make sure you specify a "timestamp by".

    Wednesday, February 3, 2016 6:54 AM

All replies

  • Is your query using a "timestamp by" statement to specify a column to be used as timestamp of the event? If not, then last modified time of the blob would be taken as the event time.

    Using time partitioned paths and custom time is the right way to process older blobs, in addition to that please make sure you specify a "timestamp by".

    Wednesday, February 3, 2016 6:54 AM
  • Great, sounds like setting custom start time of say a year ago would have ASA start looking at data from year ago. 

    Yes we are using Timestamp by to datetime that makes sense. Also, our blobs are partitioned with {date}/{time}/{partition} format where {date}/{time} is derived from the same column Timestamp by is pointing to.

    Thursday, February 4, 2016 10:20 PM