locked
Stream analytics failing once every ~30 days RRS feed

  • Question

  • Hey guys,

    So I've been using Stream analytics to push data from eventhubs to PowerBI. I'd say this event occurs every 30 seconds on average, but sometimes there's a couple with rapid succession.

    It works fine, except that every once in a while, it fails, due to "Process Events: Node shutdown due to high memory utilization." Then it needs a manual restart, (I can even select "from moment last stopped", works fine) and it's fine again

    Is there a way I can find out what the cause is of this memory utilization? Maybe I've set up the event hubs wrong, or my query is very inefficient? I cannot see the current memory utilization as far as I can see.

    My query is as follows:

    SELECT
        I1.timeProduced,
        I1.timeProduced as timeProduced2,
        substring(cast(I1.[timeProduced] as nvarchar(max)), 12, 8) as timeOfDay3,
        I1.serialNumber,
        I1.value,
        I1.eventId,   
        I1.sequence,
        I1.country,
        I2.Country as countryShort,
        I2.Customer,
        I2.[End user],
        I2.Product
    INTO
        [PowerBIOutputSoftwareDevelopment]
    FROM
        [PBIEventHub] I1
        left join [VBMMasterList] I2
        ON I1.serialNumber = CAST(I2.[Serial number] as nvarchar(max))SELECT
        I1.timeProduced,
        I1.timeProduced as timeProduced2,
        substring(cast(I1.[timeProduced] as nvarchar(max)), 12, 8) as timeOfDay3,
        I1.serialNumber,
        I1.value,
        I1.eventId,   
        I1.sequence,
        I1.country,
        I2.Country as countryShort,
        I2.Customer,
        I2.[End user],
        I2.Product
    INTO
        [PowerBIOutputSoftwareDevelopment]
    FROM
        [PBIEventHub] I1
        left join [VBMMasterList] I2
        ON I1.serialNumber = CAST(I2.[Serial number] as nvarchar(max))

    Tuesday, June 5, 2018 11:34 AM

All replies

  • Hi,

    Sorry to hear you experiment some out-of-memory issues with your job.

    While we don't have yet a specific memory metric, you can get this information on the "SU%" metric. Usually it is good to test your system at peak load to provision the needed number of SUs.

    You can see how to scale the query on this page: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-streaming-unit-consumption 

    This page also describes what factors may increase the memory consumption (e.g. reference data JOIN, and reordering buffer).

    Let me know if you have further question,

    Thanks,

    JS

    Tuesday, June 5, 2018 4:07 PM
  • Dear JS,

    Okay, I had a look and indeed the SU% seems to be hovering around 50% with a sudden spike at the first of the month, being the 1st of the month maybe is not a coincidence. Unfortunately the telemetry cannot go back further than 30 days, so I cannot verify this suspicion with the previous occurrences. The data source does not send significantly more data to event hubs on the 1st.

    Currently the input of the stream are an eventhub (with 2 partitions) and a static json file ("VBMMasterlist", ~5kb) which contains more details per device and are joined by serial number.

    The event hub it's linked to receives about 10 events per minute, each of a few KB. So very very little.

    The Stream has had no events that arrived out of order, or any form of errors.

    The load seems way too high even with 50%, especially considering each SU costs 80 euros per month. Obviously I'm doing something wrong. I had a look at the link you posted, but cannot figure out why the load has to be so high? (and maybe being the first of a month, why could this cause issues all of a sudden?)

    Could you help me with the query where a possible leak could be? All the query should do is take event X, add a few columns of data which is a subset of json file Y, and send it to powerbi. No need to look back or ahead to other events.








    • Edited by TMTT.NL Wednesday, June 6, 2018 1:54 PM
    Wednesday, June 6, 2018 1:44 PM
  • Just a bad thought, could it be the PowerBI capacity limitation that is doing this? 

    do the math , how much data are you pumping in PowerBI?

    try Blog instead of powerBI will it do the same?


    Sincerely Nik -- Please kindly mark the post(s) that answered your question and/or vote for the post(s). http://sqldataside.blogspot.ca/ (SQL Tabular + PowerShell)

    Saturday, June 9, 2018 8:14 PM
  • When it spikes to 90+% and comes back down to 0%, do you have to manually restart the job? Or you seeing the job automatically recover and continue processing?
    Friday, June 15, 2018 7:28 PM