none
how to perform partial cube processing without losing data. RRS feed

  • Question

  • Hi All,

    I have cube where measure groups are partitioned into 2 partition for different time frame as below:

    1. latest 2 years data ( PRT_DATA_2YEARS )

    2. older data than 2 years ( PRT_DATA_REST )

    By processing 1st partition ( with any of processing option ), How can I make sure to process/refresh only last 2 years of data? Meaning only latest 2 years of data will be updated and rest of the data will be intact.

    Example: lets say actual data on database got updates for all years. and till now cube has not reflected those changes because it hasn't processed yet. Now we run cube processing job, that will update or fetch updated data only for latest 2 years in the cube. And rest of the data will be intact. There will not be any change in the cube data which is older than 2 years.

    Request you to please help to resolve this.

    Any suggestions/hints/trick are welcomed.

    Regards



    • Edited by ScriptSoft Friday, January 17, 2020 9:12 AM
    Friday, January 17, 2020 9:12 AM

All replies

  • So there are no tricks here, if you only process PRT_DATA_2YEARS that is the only data that will be read. The other partition will not be touched. The only way you might see changes in the old partition is if you do a processUpdate on your dimensions and you have changed dimension records that are pointing to the old data

    So if you had data like the following:

    ProductID  Color
    1               Red
    2               Blue

    And you changed it to this:

    ProductID  Color
    1               Green
    2               Blue

    You would see the historic data for Red products change. But if this is your issue this is not a problem with partition processing, this is an issue with your data warehouse design if you don't want these sorts of changes to impact on data in older partitions you need to implement different change capturing logic in your dimension tables using something like a type 2 slowly changing dimension pattern (you should be able to find plenty of information on slowly changing dimensions by doing a google search)


    http://darren.gosbell.com - please mark correct answers

    Sunday, January 19, 2020 10:26 PM
    Moderator
  • Hi ScriptSoft,

    Thanks for your question.

    >>By processing 1st partition ( with any of processing option ), How can I make sure to process/refresh >>only last 2 years of data? Meaning only latest 2 years of data will be updated and rest of the data will >>be intact.

    Your question is so abstract that we couldn't get your main thought. What does the words "any of processing option" refer to ? You're making yourself confused. which process option would you use to process which object? This question is what we need to get. At first, we could know you are operating which object. Also, we could know your are using which process option. In addition, the operations of different objects  at the same time also have some relationship and effects. These problems couldn't been described clear in one reply, you need to keep trying and practicing in your scenario, so that you gradually get the reason of these operations with related articles.

    Best Regards,

    Will


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, January 20, 2020 2:49 AM
  • Many Thanks Will & Darren,

    Your responses always helps.

    Let me reframe my question based on your comment and if needed i can provide objects or ssis design how I am doing what:

    - Base tables of the oracle DB can get insert, update, delete statement for any of data for any of the period.

    - in  SSAS DB for which source is above oracle DB, there are 8 cubes -- each cube has different no of measure groups making 34 in total -- each measure group partitioned into 2 partitions based on time (2 year and rest of period), making it 68 in total.

    - there are total 100 dimensions in the solution, most are database dimensions shared between different cubes as cube dimensions.

    - SSAS DB, containing above objects, is being processed using 'Analysis Services Processing Task' in SSIS with 'Full Process' Mode.

    -----Above is what is current scanario which is working perfectly---But below is new requirement I have to address.

    Current ssas db or Cube processing fetch latest updates for ALL data across periods. As per new requirement, processing of cube(or ssas db) should fetch latest updates only for last 2 years of data and rest of data should be intact even when there are updates in oracle base tables.

    >>What does the words "any of processing option" refer to ?<<

     By saying this I just meant to ask if any combination of different objects in ssas db with different processing options will be able to achieve this?

    Let me know if more information required.

    Thanks in Advance & Regards,



    • Edited by ScriptSoft Monday, January 20, 2020 4:08 AM
    Monday, January 20, 2020 4:08 AM
  • So to do this you will need to change your dimension processing to a processUpdate option, this prevents dimension processing from clearing related partitions. Then you would probably be best to do a processData on the recent partitions to re-load the data for those partitions and a processDefault at the cube or database level to re-process any the aggregations for the recent partition as well as any that might have been dropped due to changes in dimensions.

    http://darren.gosbell.com - please mark correct answers

    Monday, January 20, 2020 4:42 AM
    Moderator
  • One more point, when you process dimensions, please use batch processing settings, so that the dimensions and related cubes could be processed at the same time, not have to process other objects referencing these dimensions at other time.

    Best Regards,

    Will


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, January 21, 2020 3:03 AM
  • Thanks Again for your response Darren & Will,

    I will test it and will let you know.

    Regards,




    • Edited by ScriptSoft Tuesday, January 21, 2020 6:05 AM
    Tuesday, January 21, 2020 6:05 AM
  • Hi, I tried and tested as per below 3 approach:

    1. ProcessUpdate (Dimensions) --> ProcessData( Partitions of latest period) --> ProcessDefault (SSAS DB)

    2. ProcessUpdate (Dimensions) --> ProcessData( Partitions of latest period) --> ProcessDefault (cube)

    3. ProcessUpdate (Dimensions) --> ProcessData( Partitions of latest period) --> ProcessIndex (cube)

    3rd test is completely wiping out data from old partitions and overwriting data of latest partitions. Hence only latest data from last 2 years is available, rest is gone.

    1st and 2nd test, fetching ALL records from source DB, hence data from older partitions also getting updated. so complete cube getting latest n updated data irrespective of period.

    Can you please suggest where I may be wrong?

    Regards,
    • Edited by ScriptSoft Tuesday, January 28, 2020 10:00 AM
    Tuesday, January 28, 2020 7:46 AM
  • Maybe:

    ProcessUpdate the Dimension will touch Partitions if there are any changes in hierarchy. (removing aggregates,indices...)

    Imagine u do a processupdate (which will also delete records!!). Now in your partitions u have keymappings to dimensionelements which do not exists any more. Result: U will not see any data from this partitions until u reload the partition and depending on your DataMart implementiation, setting Unknownmapping to "Map to Unknown"


    Tuesday, January 28, 2020 12:57 PM
  • Thanks yger for ur response,

    to make it more clear, below is approach how I am processing cube and testing its data:

    There are 2 ssis jobs where I am using 'Analysis Service Processing Task' to process ssas db and its objects.

    First job- process the SSAS DB to fetch all data from source DB.

    2nd Job- process only partitions which trying to fetch latest 2 years of data.

    I run first job and can see all records in cube. Then I update the facts for ALL periods/years in source DB. and then I run 2nd job as per different approach mentioned above in one of comment. My expectation is: when I browse cube after 2nd job execution, it should have updated fact data ONLY for last 2 years and for previous years, it should contain old data which was there before I updated facts.

    But as per different tests, either it is bringing whole data in cube, showing updated facts for ALL years OR showing only latest 2 years data with updated facts, & rest data is not at all there.



    • Edited by ScriptSoft Wednesday, January 29, 2020 3:34 AM
    Wednesday, January 29, 2020 3:34 AM
  • The question: what updates are u doing in source for historic data? what type of updates is this? If your updates of historic data in your source will change FK relations to Dimensions (maybe new surrogates) -> then your historic partitions would match to nonexisting dimensionsmembers and as a result u wont see historic data after updating the source and just reloading last 2 years data.
    Wednesday, January 29, 2020 9:49 AM
  • Hello yger,

    It is simple update query at source DB just to test the scenario as below:

    UPDATE factEALDeposits SET bln_cls_pos = bln_cls_pos / 1000, bln_opn_pos = bln_opn_pos / 1000 WHERE PRD_ID IN (SELECT PRD_ID FROM DimPeriod WHERE PRDYR IN ('2019', '2018', '2017', '2016'));




    • Edited by ScriptSoft Wednesday, January 29, 2020 2:53 PM
    Wednesday, January 29, 2020 2:53 PM