locked
Loading wrong reference data RRS feed

  • Question

  • Hi all,

    I've setup 1 reference data by pointing to a CSV on azure blob storage with this path:

    {date}/{time}/output-alarms.csv

    date format: DD-MM-YYYY 

    time format: HH-mm

    But it seam that the output is wrong due to loading wrong reference data. As i see from Operation Logs, it tells:

    Message:

    Load new reference data from 31-05-2016/02-38/output-alarms.csv starting at 05/31/2016 02:38:00: InputSourceAlias:outputalarm;Shard ID:0;AdapterType:ReferenceData; Message Time: 5/31/2016 3:30:09 AM UTC

    or

    Message:
    Load new reference data from 31-05-2016/02-37/output-alarms.csv starting at 05/31/2016 02:37:00: InputSourceAlias:outputalarm;Shard ID:0;AdapterType:ReferenceData;
    
    Message Time:
    5/31/2016 3:30:09 AM UTC

    I don't understand why at 3:30:09 UTC, it load the file at 31-05-2016/02-37/output-alarms.csv or 31-05-2016/02-38/output-alarms.csv

    Also here's my query in case you want to take a look:

    SELECT System.TimeStamp as CreatedDateUtc,
       ck.CellId as LocalCellId,
       ck.CellType as CellType,
       ck.EnodeBId as ENodeBId,
       ck.CellName as CellName,
       'MIMO' as MoClass, 
       'MIMO degradation' as Slogan,
        CONCAT ( 'Local Cell ID:', ck.CellId, ', Cell Name:', ck.CellName)  as Attribute,
        '' as MoInstance,
        '' as AdditionInfo,
        '' as Severity 
        
    INTO
        addmimoalarm
    FROM
        kpi ck TIMESTAMP BY InDateUtc    
    LEFT OUTER JOIN configuration c 
    on c.CellName = ck.CellName   
    LEFT OUTER JOIN outputalarm o 
    on o.CellName = ck.CellName   
      WHERE c.TxRxMode = '2' AND ( o.Slogan <> 'MIMO degradation' OR o.Slogan is null)
    GROUP BY
        ck.CellName,SlidingWindow(day, 1),o.Slogan,ck.CellId,ck.CellType,ck.EnodeBId
    HAVING avg(ck.Rank1/(ck.Rank1 + ck.Rank2 + ck.Rank3 + ck.Rank4)) > 0.3 
    -------------------------------------------------------------------------------
    
    SELECT System.TimeStamp as CeaseDateTimeUtc,
       o.LocalCellId as LocalCellId,
       o.CellType as CellType,
       o.EnodeBId as ENodeBId,
       ck.CellName as CellName,
       o.MoClass as MoClass, 
       o.Slogan as Slogan,
       o.Attribute as Attribute,
       o.MoInstance as MoInstance,
       o.AdditionInfo as AdditionInfo,
       o.Severity as Severity ,
       o.CreatedDateUtc as CreatedDateUtc 
    INTO
        ceasemimoalarm
    FROM
        kpi ck TIMESTAMP BY InDateUtc    
    JOIN outputalarm o 
    on o.CellName = ck.CellName   
    WHERE o.Slogan = 'MIMO degradation' 
    GROUP BY
        ck.CellName,
        SlidingWindow(day, 1),
        o.Attribute,
        o.Severity,
        o.MoClass,
        o.MoInstance,
        o.AdditionInfo,
        o.CreatedDateUtc, 
        o.Slogan,
        o.LocalCellId,
        o.CellType,
        o.EnodeBId
    HAVING avg(ck.Rank1/(ck.Rank1 + ck.Rank2 + ck.Rank3 + ck.Rank4)) < 0.3
    
    --------------------------------------------------------------------------------
    SELECT DATEADD(millisecond,1,System.TimeStamp) AS CreatedDateUtc,
    ck.CellId as CellId ,
    ck.CellType as CellType, ck.CellName as CellName,
    cast(avg(ck.InterferenceAvg) as bigint) as InterferenceDbm,
    cast(avg(ck.Rtwp) as bigint) as RtwpDbm
    INTO kpilogsql
    FROM kpi ck TIMESTAMP BY InDateUtc  
    GROUP BY ck.CellName, TumblingWindow(Duration(hour, 1), Offset(millisecond, -1)),ck.CellType,ck.CellId




    Tuesday, May 31, 2016 3:41 AM

Answers

All replies

  • Hi,

    On job startup we look for and load the version of reference data that is valid for the startup time. So if your job was started around 5/31/2016 3:30:09 AM UTC then we need to load the reference data with an encoded time before the startup time otherwise join statements with reference data might return no results. In your case might be the one with encoded date/time of 5/31/2016 2:38:00 AM UTC or from 5/31/2016 2:37 AM UTC is the most recent reference data before startup time.

    We might also restart the job internally and resume processing so these messages are expected.

    Hope it helps


    This posting is provided "AS IS" with no warranties, and confers no rights

    Friday, June 3, 2016 5:47 PM
  • Hi Silviu,

    Actually our events will come after  15 minutes like 00:00, 00:15, 00:30,00:45 and i don't know why it will load reference data at 5/31/2016 2:38:00 AM UTC or 5/31/2016 2:37 AM UTC

    Can u explain further the way Azure SA load reference data here. I guess it just load when job start up or when event come, right? or it load random?

    Thanks

    Monday, June 6, 2016 3:08 AM
  • We're making sure to have matching reference data for the corresponding streaming data in the join. When the job starts we load the most recent version tagged with a time before job start time and then when processing time passes and we find a newer version we load it instead. See more information here: https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-use-reference-data/

    Hope it helps


    This posting is provided "AS IS" with no warranties, and confers no rights

    Thursday, June 23, 2016 11:26 PM