locked
Issues using a blob stored reference file RRS feed

  • Question

  • Hi there, 

    I've got an ASA job configured to use a JSON file stored in Azure blob storage. This file contains reference data which I use in my job to compare the incoming data to certain thresholds. 

    Here's the issues I have: 

    • Whenever I change the json file in blob storage, the ASA job pick up on this. But when I then change the file again, this new change doesn't get picked up and the stream keeps outputting the old reference value. 
    • When I stop and start the job, the new file is picked up accordingly. 
    • I've read that you need to have the date / time included in your storage pattern for the job to pick up any changes. So I now store my file (devicerules.json) inside a the following structure: {date}/{hours}/{minutes}
    • The documentation states that you can store it as {date}/{time} and set the time format to "HH:mm", but the interface doesn't allow that (see screenshot)

       

    • When I place the file into a "minutes" subdirectory though, the input still works but again doesn't pick up any changes to the file.
    • I've also read that you should not overwrite the file, but instead "add a new blob using the same container and path pattern defined in the job input and use a date/time greater than the one specified by the last blob in the sequence". That's cool, but how should I create a new file within the container when I need to specify the exact filename in the ASA input? When I specify "{date}/{time}/devicerules.json" I assume it's going to look for a file named "devicerules.json" and not any other file with a different name. 

    So let me know whether I missed something or there are some other options I could try. Thanks! 

    Tuesday, April 19, 2016 9:18 AM

Answers

  • Hi,

    The documentation for how to configure and use reference data is available here and it should have more details than my reply: https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-use-reference-data/

    As for the issues you're seeing I'll try to answer them below:

    • The reference data blobs are only loaded once. If the blob contains a date and time encoded in the name it will be loaded no sooner than that time. If it doesn't have a date and time it will be loaded when the job starts.
    • See previous.
    • If you want to refresh the content of reference data you need to specify it via a series of blobs where each blob has the time when it should be active encoded in the name using the {date} and {time} macros. See more examples in the online documentation I linked it above.
    • The issue you are seeing with not being able to select minutes in the time format is a bug in the new Azure portal. You can still configure the reference input properly including minutes in the classic portal at https://manage.windowsazure.com. We are working on resolving this issue soon. Thanks for reporting it.
    • If the file name matches the pattern we will load it as explained in the first bullet point. If you're using the new Azure portal and you cannot select the HH:mm then I believe the file you place in the minute folder will not be loaded because it will not match the specified pattern.
    • The documentation might be confusing here. The example might help but I'll take a look and try to make it more clear. The idea is as follows: if you want to refresh the content of the reference data you supply a path pattern that describe how you plan to name the files and where the date and time values will be present in the blob name: using your example if you specify "{date}/{time}/devicerules.json" the blobs can have the following name: "2016-04-20/10/devicerules.json" which instructs the ASA job to use this file starting at 10AM UTC time on April 20th, year 2016. Now if you also add a blob named "2016-04-20/13/devicerules.json" it instructs the ASA job to discard the content of "2016-04-20/10/devicerules.json" at 1:00PM UTC time on April 20th, year 2016 and replace it from that point on with the content of "2016-04-20/13/devicerules.json".

    Does that make sense? Did it help explain how this refresh mechanism works? Let us know if you could make this work and/or if you encounter any more issues.

    Thanks


    This posting is provided "AS IS" with no warranties, and confers no rights

    Wednesday, April 20, 2016 8:24 PM

All replies

  • Hi,

    The documentation for how to configure and use reference data is available here and it should have more details than my reply: https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-use-reference-data/

    As for the issues you're seeing I'll try to answer them below:

    • The reference data blobs are only loaded once. If the blob contains a date and time encoded in the name it will be loaded no sooner than that time. If it doesn't have a date and time it will be loaded when the job starts.
    • See previous.
    • If you want to refresh the content of reference data you need to specify it via a series of blobs where each blob has the time when it should be active encoded in the name using the {date} and {time} macros. See more examples in the online documentation I linked it above.
    • The issue you are seeing with not being able to select minutes in the time format is a bug in the new Azure portal. You can still configure the reference input properly including minutes in the classic portal at https://manage.windowsazure.com. We are working on resolving this issue soon. Thanks for reporting it.
    • If the file name matches the pattern we will load it as explained in the first bullet point. If you're using the new Azure portal and you cannot select the HH:mm then I believe the file you place in the minute folder will not be loaded because it will not match the specified pattern.
    • The documentation might be confusing here. The example might help but I'll take a look and try to make it more clear. The idea is as follows: if you want to refresh the content of the reference data you supply a path pattern that describe how you plan to name the files and where the date and time values will be present in the blob name: using your example if you specify "{date}/{time}/devicerules.json" the blobs can have the following name: "2016-04-20/10/devicerules.json" which instructs the ASA job to use this file starting at 10AM UTC time on April 20th, year 2016. Now if you also add a blob named "2016-04-20/13/devicerules.json" it instructs the ASA job to discard the content of "2016-04-20/10/devicerules.json" at 1:00PM UTC time on April 20th, year 2016 and replace it from that point on with the content of "2016-04-20/13/devicerules.json".

    Does that make sense? Did it help explain how this refresh mechanism works? Let us know if you could make this work and/or if you encounter any more issues.

    Thanks


    This posting is provided "AS IS" with no warranties, and confers no rights

    Wednesday, April 20, 2016 8:24 PM
  • Thank you very much, Silviu. I think my understanding of the concept was ok but it was just the bug in the new portal that made it all a bit confusing.

    I've now modified the job via the old portal. That's also how I found out why my Powershell solution wasn't working; "HH:mm" as some of the samples show isn't an actual option? The available options are HH, HH-mm and HH/mm. So I now went for HH-mm and adapted my logic to put the file in such a structure. It now seems to work as expected! :) 

    Thursday, April 21, 2016 6:36 AM
  • For those hitting this post, I've documented some of this stuff in a blog post: http://blog.repsaj.nl/index.php/2016/04/iot-stream-analytics-reference-data-updates/
    Friday, April 29, 2016 8:37 PM