none
How can I make sure old data is visible with latest processed data in SSAS cube? RRS feed

  • Question

  • Hi All,

    I have a cube which is taking quite long processing time (approx 7 hours) for full period. Using Partitions, I am trying to process the cube only for last 2 years.

    How can I make sure the latest 2 years data must process and also data older that 2 years should be there in cube?

    For example: I ran cube processing for full period on 1st Oct, which gives all data in cube. Now I ran cube processing for only last 2 years on 1st Nov. which means cube will contain only last 2 years of UPDATED data, but old data (latest up to 1st Oct) will not be updated on 1st Nov But should be there in Cube. Hence if data is getting updated within 2 years of period that will be visible in cube but if it is updated before 2 years only old data will be shown.

    I have data which is getting update/delete/insert hence can not use incremental.

    Anyone can please help or suggest?

    Many Thanks




    • Edited by ScriptSoft Monday, November 4, 2019 12:11 AM
    Monday, November 4, 2019 12:11 AM

All replies

  • Hi ScriptSoft,

    Thanks for your question.

    >>How can I make sure the latest 2 years data must process and also data older that 2 years should be >>there in cube?

    You could check last processed property under partitions in SSMS.

    For how to find that property, please refer to the details:

    1.You could select that SSAS database under SSAS instance in SSMS, turn to Cubes> Specific Cube name> Measure Groups>Specific Measure Group Name> Partitions>Specific partition name, right click it and select properties:

    2. Select specific partition, turn to General tab, view the value of last processed property.

    Best Regards,

    Will


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, November 4, 2019 2:46 AM
  • Many Thanks Will for your response,

    I am sorry that my question was not that descriptive to understand. Please allow me to try again.

    I am not intended to know which partition or measure group processed what time.

    In fact I have a requirement to minimize the cube processing time. To achieve the same, I have to process cube with last 2 years of data But old data should NOT lost.

    Currently cube has data for all period. I have created 2 partitions as below:

    1. Data_2Years : Here partition query fetching last 2 years of data

    2. Data_Remaining: Here partition query fetching all previous data before 2 years

    If I process cube with only partition Data_2years, I doubt it will overwrite the all existing data of the cube and only last 2 years of data will be there to browse from cube.

    Hence My question is what are ways where even if last 2 years data will get refreshed(gets updated) but cube should not loose older data (<2 years) which is not getting updated. currently I am using SSIS's Analysis Services Processing Task to process the cube/partition.

    Thanks in Advance,



    • Edited by ScriptSoft Monday, November 4, 2019 4:10 AM
    Monday, November 4, 2019 4:10 AM
  • anyone ..any comment please?



    • Edited by ScriptSoft Tuesday, November 5, 2019 4:28 AM
    Tuesday, November 5, 2019 4:28 AM
  • Hi ScriptSoft,

    Thanks for your reply.

    >>If I process cube with only partition Data_2years, I doubt it will overwrite the all existing data of the

    >>cube and only last 2 years of data will be there to browse from cube.

    You don't have to worry about that. Since you have created partitions based on period, it would not affect other data of partitions when you process the partition "Data_2Years"

    >>Hence My question is what are ways where even if last 2 years data will get refreshed(gets >>updated) but cube should not loose older data (<2 years) which is not getting updated. currently I >>am using SSIS's Analysis Services Processing Task to process the cube/partition.

    You could consider using Process Add option to process the partition.

    Processing Options and Settings (Analysis Services)

    Best Regards,

    Will


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, November 5, 2019 9:23 AM
  • Many Thanks for your reply Will,

    I will check and get back to you

    Regards,



    • Edited by ScriptSoft Wednesday, November 6, 2019 2:26 AM
    Wednesday, November 6, 2019 2:26 AM
  • Hello Will,

    I processed the cube with ALL periods first, just by selecting 'SSAS DB' in the SSIS Component (which will consider ALL partitions). It was showing all data across the periods. And then I processed the cube with last 2 year partitions Data_2Years. For this I first processed Dimensions using ProcessUpdate and then processed partitions using ProcessFull. But surprisingly when I browsed cube, all data older than 2 year s vanished/disappeared.

    >>You don't have to worry about that. Since you have created partitions based on period, it would not affect other data of partitions when you process the partition "Data_2Years"

    Am I doing something wrong? OR Do I need to strictly process partitions using ProcessAdd to retain the older data?

    Many Thanks,

    Regards.



    • Edited by ScriptSoft Thursday, November 7, 2019 3:10 AM
    Thursday, November 7, 2019 3:10 AM
  • Hello Will,

    Also I think ProcessAdd will only add the newly available members(INSERT) in the fact/dimension data. It will not consider the existing data which is being updated at source end ( DELETE/UPDATE).

    Which is actually not meeting the requirement in my case.

    Is there any other way you can suggest/help to process the 2 year partitions only but all data will be available?



    • Edited by ScriptSoft Thursday, November 7, 2019 3:59 AM
    Thursday, November 7, 2019 3:59 AM
  • Hey  Scott,

    Have you tried ProcessData and than ProcessIndex . Also, can you share the non presenting partition's query ?


    Regards, David .

    Thursday, November 7, 2019 5:34 AM
  • Thank You David for your response,

    I have not yet tried ProcessData and then ProcessIndex, but my understanding is ProcessFull = ProcessData + ProcessIndex, Please correct me if wrong.

    And Is there any difference between ProcessFull and (1st ProcessData then ProcessIndex)

    If you can please comment on it?

    Meanwhile I will definitely give it a try and revert back.

    Many thanks & Regards,




    • Edited by ScriptSoft Friday, November 8, 2019 7:47 AM
    Friday, November 8, 2019 7:47 AM
  • Hi ScriptSoft,

    Sorry for delayed response.

    Due to that I just had a face to face interview with the candidate, I have no time to pay more attention to my followed case. Sorry for any inconvenience.

    >>I processed the cube with ALL periods first, just by selecting 'SSAS DB' in the SSIS Component (which will consider ALL partitions). It was showing all data across the periods. And then I processed the cube with last 2 year partitions Data_2Years. For this I first processed Dimensions using ProcessUpdate and then processed partitions using ProcessFull. But surprisingly when I browsed cube, all data older than 2 year s vanished/disappeared.

    <<

    I don't know why you processed the cube with all periods at first. Don't do any other process operation to other objects, Just do process add option to last 2 year partitions Data_2Years.

    Best Regards,

    Will


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, November 8, 2019 9:45 AM
  • Thank You David for your response,

    I have not yet tried ProcessData and then ProcessIndex, but my understanding is ProcessFull = ProcessData + ProcessIndex, Please correct me if wrong.

    And Is there any difference between ProcessFull and (1st ProcessData then ProcessIndex)

    If you can please comment on it?

    Meanwhile I will definitely give it a try and revert back.

    Many thanks & Regards,




    >>I have not yet tried ProcessData and then ProcessIndex, but my understanding is ProcessFull = ProcessData + ProcessIndex, Please correct me if wrong.

    <<

    The meaning of Process full option is to do the process operation to objects, including data, structure, all the objects that it contains, for example partitions, it contains dimensions, measure groups. If you have specific process requirement, you'd better not use this option.

    For more details, please refer to the article in my previous reply.

    Best Regards,

    Will


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, November 8, 2019 9:53 AM
  • Thanks for responding Will,

    >>I don't know why you processed the cube with all periods at first

    I have to achieve result as:

    1. first run the processing for all period (both partitions) once in a year(or whenever adhoc requests comes), so that data is updated in cube for through-out the period.

    2. Then run the Data_2Years partitions on daily basis to update the latest 2 years of data.

    Here I want to achieve that when already processed the partition for 2 years, older data should also be visible in cube even if it is not updated. But after I completed cube processing with ProcessFull, Only 2 years data was there and older data got dropped.

    Here At first I am running all partitions for all periods to get full data and then running 2 years partitions to see\test if this strategy will work for daily run or not.

    The underline source data being modified by INSERT/UPDATE/DELETE, And My understanding(please correct if worng) is ProcessADD only consider INSERT i.e. Newly inserted records.

    Do we have any mechanism of processing to address my concern?

    Thanks in Advance Will,

    Regards,



    • Edited by ScriptSoft Monday, November 11, 2019 7:08 AM
    Monday, November 11, 2019 7:08 AM
  • Hi Will,

    I have few more questions if you can please address:

    1. Do we need to Process Dimension ALWAYS even when there is no change in dimension data, but respective fact data is getting changed? Just to bind the new fact data with existing dimension data?

    2. ProcessFull first drops all data and then rebuild data and then index. I assume ProcessFull = ProcessData+ProcessIndex. Do you think there are other differences between these 2 ways of processing as per my requirement?




    • Edited by ScriptSoft Monday, November 11, 2019 7:24 AM
    Monday, November 11, 2019 7:24 AM
  • Thanks for responding Will,

    >>I don't know why you processed the cube with all periods at first

    I have to achieve result as:

    1. first run the processing for all period (both partitions) once in a year(or whenever adhoc requests comes), so that data is updated in cube for through-out the period.

    2. Then run the Data_2Years partitions on daily basis to update the latest 2 years of data.

    Here I want to achieve that when already processed the partition for 2 years, older data should also be visible in cube even if it is not updated. But after I completed cube processing with ProcessFull, Only 2 years data was there and older data got dropped.

    Here At first I am running all partitions for all periods to get full data and then running 2 years partitions to see\test if this strategy will work for daily run or not.

    The underline source data being modified by INSERT/UPDATE/DELETE, And My understanding(please correct if worng) is ProcessADD only consider INSERT i.e. Newly inserted records.

    Do we have any mechanism of processing to address my concern?

    Thanks in Advance Will,

    Regards,



    Hi ScriptSoft,

    Thanks for your reply.

    >>1. first run the processing for all period (both partitions) once in a year(or whenever adhoc >>requests comes), so that data is updated in cube for through-out the period.

    If the time interval is once a year, there is no problem to do such process.

    >>2. Then run the Data_2Years partitions on daily basis to update the latest 2 years of data.

    >>Here I want to achieve that when already processed the partition for 2 years, older data should also be visible >>in cube even if it is not updated. But after I completed cube processing with ProcessFull, Only 2 years data >>was there and older data got dropped.

    >>Here At first I am running all partitions for all periods to get full data and then running 2 years >>partitions to see\test if this strategy will work for daily run or not.

    The two task should be sperate, one is daily task, another is yearly task. For current task, I have mentioned it in my previous reply, not use process full option, use process add option instead.

    >>The underline source data being modified by INSERT/UPDATE/DELETE, And My >>understanding>>>>(please correct if worng) is ProcessADD only consider INSERT i.e. Newly >>inserted records.

    Yes, your understanding is correct.

    Best Regards,

    Will


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, November 12, 2019 8:54 AM
  • Hi Will,

    I have few more questions if you can please address:

    1. Do we need to Process Dimension ALWAYS even when there is no change in dimension data, but respective fact data is getting changed? Just to bind the new fact data with existing dimension data?

    2. ProcessFull first drops all data and then rebuild data and then index. I assume ProcessFull = ProcessData+ProcessIndex. Do you think there are other differences between these 2 ways of processing as per my requirement?




    This question is far way from current topic. Please open another thread to start your questions.

    Thanks for your cooperation.

    Best Regards,

    Will


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, November 12, 2019 8:56 AM
  • Thanks Will for your response and apologies for my delayed response.

    >>I have mentioned it in my previous reply, not use process full option, use process add option instead.<<

    I can not use ProcessAdd as source data is getting changed as per INSERT/DELETE/UPDATE (all three) queries and user want those updates to be reflected. That is why I am looking for alternate option which can support my above mentioned requirement.

    Obviously yearly ProcessFull job is fine and need no change to be made. I am only worried about "Daily Run" which will fetch only last 2 years of data without old data getting lost.

    >>This question is far way from current topic.<<

    I have asked diff between ProcessFull AND combined (ProcessData+ProcessIndex) because as suggested by David I have tested (ProcessData+ProcessIndex) which is actually answer for my original question. Unlike ProcessFull, ProcessData first and then ProcessIndex is NOT deleting old data and fetching the latest data for last 2 years.

    If you can please suggest any difference which I need to take care of before implementing (ProcessData+ProcessIndex), that will be great.

    Many thanks Will,

    Regards,



    • Edited by ScriptSoft Monday, November 18, 2019 3:19 AM
    Monday, November 18, 2019 3:19 AM
  • I have no idea why Will thinks your follow up questions are too far off topic, they seem perfectly reasonable in term of understanding how to get a partial refresh working.

    1. Do we need to Process Dimension ALWAYS even when there is no change in dimension data, but respective fact data is getting changed? Just to bind the new fact data with existing dimension data?

    No, if you are 100% sure that no dimension data has changed you don't need to re-process your dimensions. But if there is any chance that there are new rows or changes you should do a ProcessUpdate on your dimensions (this is more resource intensive than a ProcessFull, but it retains the internal pointers so it prevents historic data from being dropped)

    2. ProcessFull first drops all data and then rebuild data and then index. I assume ProcessFull = ProcessData+ProcessIndex. Do you think there are other differences between these 2 ways of processing as per my requirement?

    If you are targeting a partition the ProcessFull = (ProcessData + ProcessIndex) there should be no difference between the two sets of operations. It's only if you target other object types that the nature of a processFull changes.


    http://darren.gosbell.com - please mark correct answers

    Tuesday, November 19, 2019 12:38 AM
    Moderator
  • Thanks Darren for your response,

    I have a follow up question.. As per my testing, ProcessFull of only Data_2years partition is removing all data which is older than 2 years and fetching latest 2 years data into cube. But If I first perform ProcessData and then ProcessIndex of same Data_2years partitions, it is fulfilling my requirement i.e. older data still exists while fetching latest 2 years of data.

    Does this happen in any particular scenario or I can consider this as fact and can implement?

    Can you please comment on how processFull and (ProcessData+ProcessIndex) behaving differently.

    Regards,




    • Edited by ScriptSoft Tuesday, November 19, 2019 3:47 AM
    Tuesday, November 19, 2019 3:47 AM
  • Can you please comment on how processFull and (ProcessData+ProcessIndex) behaving differently.

    They should behave identically.

    The only scenario where processFull could be different is if you've included a dimension in the processFull command. If you do a processFull on a dimension it will drop all the data from all related partitions, which sounds like the behaviour you are seeing. Doing a processStructure at the Cube level will also clear out all the partition data. But I can't think of any scenario where a processFull on it's own on one partition should have any impact on any other partitions.


    http://darren.gosbell.com - please mark correct answers

    Tuesday, November 19, 2019 4:21 AM
    Moderator
  • thanks for your quick response Darren,

    I am processing cube using ssis tasks in sequence as below:

    1. first ALL dimensions using ProcessUpdate,

    2. then only partitions for 2 years of data (Data_2years) using ProcessData.

    3. then only partitions for 2 years of data (Data_2years) using ProcessIndex

    Above processing is giving me latest of last 2 years data + older data is also available.

    When I change above 2 & 3 steps with [only partitions for 2 years of data (Data_2years) using ProcessFull], it removes all data and only fetch latest of last 2 years data.

    processData also drops existing data and re-bind it from source DB while processing- which means it should drop all existing data and fetch latest of only 2 years of data. But not sure in my case what is wrong or right as it is performing this way. I doubt if (processData+ProcessIndex) change behavior later in production , will be a problem.

    Thanks & Regards




    • Edited by ScriptSoft Tuesday, November 19, 2019 4:58 AM
    Tuesday, November 19, 2019 4:58 AM
  • Hi ScriptSoft,

    Thanks for your update.

    Sorry for that, haven't captured  all the situations of your environment. Sometimes your description makes me confused. But these depends on your design and time schedule.

    Some scenarios are also should be considered, please refer:

    >>I am processing cube using ssis tasks in sequence as below:

    >>1. first ALL dimensions using ProcessUpdate,

    ProcessUpdate option not only dropped the partition "Data_2years" 's indexes and aggregations, but also dropped the ones in other partitions.

    Besides, when you process a dimension, be aware that the dimension might be used in several cubes, consider using batch processing option.

    Best Regards,

    Will


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, November 21, 2019 3:28 AM
  • Thanks for responding Will,

    >>ProcessUpdate option not only dropped the partition "Data_2years" 's indexes and aggregations, but also dropped the ones in other partitions.<<

    Agree Will. But do you mean that processFull after processUpdate will recover Indexes & aggregations for partitions whereas (ProcessData and ProcessIndex) will only build indexes not aggregations??

    I have not yet tested aggregation part till now, but (ProcessData and ProcessIndex) fulfilling the requirement of partial refresh (only 2 years) with older data loss. and that is the reason I asked if any difference between processFull and (ProcessData & ProcessIndex together) or any specific thing to take care in this case before I actually implement this?  

    >>Besides, when you process a dimension, be aware that the dimension might be used in several cubes, consider using batch processing option.<<

    Thanks Will, I have taken care this part.

    Thanks & Regards,



    • Edited by ScriptSoft Friday, November 22, 2019 8:33 AM
    Friday, November 22, 2019 8:32 AM
  • Hi ScriptSoft.

    Thanks for your update.

    >>Agree Will. But do you mean that processFull after processUpdate will recover Indexes & >>aggregations for partitions whereas (ProcessData and ProcessIndex) will only build indexes not >>aggregations??

    I haven't suggested you to use process Full option, you misunderstood my description. Don't connect the two options (Process full, ProcessUpdate) in the same place, they are two different options. Besides, ProcessIndex could not only rebuid indexes but also aggregations.

    >>I have not yet tested aggregation part till now, but (ProcessData and ProcessIndex) fulfilling the >>requirement of partial refresh (only 2 years) with older data loss. and that is the reason I asked if >>any difference between processFull and (ProcessData & ProcessIndex together) or any specific >>thing to take care in this case before I actually implement this?  

    Please make up you mind to schedule your task. Once you come across problem, please share your script here and describe your issue. No amount of fantasy will help.

    Best Regards,

    Will


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, November 26, 2019 8:16 AM