locked
[ASA Error] Node shutdown due to high memory utilization RRS feed

  • Question

  •  Hi everyone, I'm fresh to ASA and I ran into this memory issue, please kindly help!

     

    Query:

     

    SELECT

    Id,

    TenantId,

    ProductId,

    RequestTime,

    DATEPART(hour, RequestTime) AS PartitionId,

    CONCAT(Id, '-', RequestTime) AS DistributionId,

    COUNT(*) AS Hits

    INTO

        LinkHit

    FROM Log PARTITION BY RequestTime -- yyyy-MM-dd-HH

    GROUP BY RequestTime, Id, TenantId, ProductId, TumblingWindow(Minute, 30)

     

    Input amount in 1 hour:

     

    Error message:

    • Process Events: Node shutdown due to high memory utilization.
    • Resource usage is over the capacity for one or more of the query steps. Event processing may be delayed or stop making progress. This may be a result of large window in your query, large events in your input, large out of order tole

     

    Correlation Id: 8e301e8e-9ec3-49c8-8f58-c7be45a9e30f

     

    Other conditions:

    Input: blob storage

    output: Azure data warehouse

    was using 24 SU

    out or order & late arrival tolerance: 0 seconds.

     

    My question is:

    1. I tried to set the job with more SU but I got error message this query can only go with 24 SU, so kind of dead end with that.
    2. Even #1 is workable in some way with more SU, is there something can be perf-improved against my query?

     

    I was wondering whether I can achieve a perf improvement like this:

     

    --  Step 1

    WITH InputRequests AS

    (

    SELECT

    Id,

    TenantId,

    ProductId,

    RequestTime,

    COUNT(*) AS Hits

    FROM Log PARTITION BY RequestTime -- yyyy-MM-dd-HH

    GROUP BY RequestTime, Id, TenantId, ProductId, TumblingWindow(Minute, 5)

    )

     

    -- Step 2

    SELECT

    Id,

    TenantId,

    ProductId,

    RequestTime,

    DATEPART(hour, RequestTime) AS PartitionId,

    CONCAT(Id, '-', RequestTime) AS DistributionId,

    COUNT(Hits) AS Hits

    INTO

        LinkHit

    FROM InputRequests PARTITION BY RequestTime -- yyyy-MM-dd-HH

    GROUP BY RequestTime, Id, TenantId, ProductId, TumblingWindow(Minute, 30)

     

    I was hoping multiple steps will get overall memory usage less with a shorter window in step 1, but still ended with the same error. Any chance some thing is used in a wrong manner here?

     

    I did some basic research with the online ASA documentation, but have no answer up to now, any idea is welcome!

     

    Thanks!

    Tony


    • Edited by Tony_Leo Friday, June 15, 2018 5:14 AM
    Friday, June 15, 2018 5:01 AM

Answers

  • Hi Tony,

    Glad to see it works when you use automatic partitioning.

    If it's running well and fits your business needs you may keep the current logic. Just make sure you monitor the SU% metric, so you don't run out of memory (we usually recommend to keep SU% below 80% to accommodate for occasional spikes). 

    Let us me how it goes.

    Thanks,

    JS

    Monday, June 18, 2018 10:47 PM

All replies

  • The thing that is what i will be focusing on first is the "TumblingWindow(Minute, 30)", 30 MIN???????

    you need to change that 

    Second thing that you need to check is your PARTITION

    in general a "TIMESTAMP BY" is been used, but my question is why are you using PARTITION?, focus on that

    use VS and add the query make you you start small with the query and add to the statements one by one.

    good luck


    Sincerely Nik -- Please kindly mark the post(s) that answered your question and/or vote for the post(s). http://sqldataside.blogspot.ca/ (SQL Tabular + PowerShell)

    Monday, June 18, 2018 4:27 AM
  • Hi,

    The reason why you can't scale the number of SU is because the output to SQL can't be parallelized. 

    I'll first recommend to drop the PARTITION BY keyword. For blob storage, Azure Stream Analytics will partition the reader automatically.

    For the memory issue, I wanted to double check the query logic in order to improve memory consumption. I noticed you group by "Request Time" on a 30-minutes windows. In your data, do you have multiple events belonging to each (Id,TenantId,ProductID,RequestTime) tuple? If you have 1 or very few events for each tuple, that means ASA doesn't summarize events too much but just keep running aggregate for each of this tuple for 30 minutes. 

    Let me know which case it is so I can investigate further.

    Thanks,
    JS

    Monday, June 18, 2018 5:36 AM
  • Hi JS, thanks a lot for your suggestion,  after I removed the partition by, it's running well!

    I guess you are pointing me to the right path by the 'tuple events' thing, I have another query like:

    --  hits by Region
    SELECT
    Id,
    TenantId,
    RequestTime,
    Region,
    DATEPART(hour, RequestTime) AS PartitionId,
    CONCAT(Id, '-', RequestTime) AS DistributionId,
    COUNT(*) AS Hits
    INTO
        LinkHitbyRegion 
    FROM Log
    WHERE Region != ""
    GROUP BY Region, RequestTime, Id, TenantId, PartitionId, TumblingWindow(Minute,30)

    For one Id, it may have many regions mapping to it, so this tuple (Region, RequestTime, Id, TenantId, PartitionId) will put many events into ASA memory. 

    So what can I do for this 'hits by Region' query?

    Thanks!

    Tony


    • Edited by Tony_Leo Monday, June 18, 2018 9:41 PM
    Monday, June 18, 2018 9:40 PM
  • Hi Tony,

    Glad to see it works when you use automatic partitioning.

    If it's running well and fits your business needs you may keep the current logic. Just make sure you monitor the SU% metric, so you don't run out of memory (we usually recommend to keep SU% below 80% to accommodate for occasional spikes). 

    Let us me how it goes.

    Thanks,

    JS

    Monday, June 18, 2018 10:47 PM