none
Setup Incremental data load to warehouse RRS feed

  • Question

  • Hi all,

    We have one request to implement standard optimized and best solution to copy data every night from our live database to the warehouse. The apps are utilizing live db and has size of 200 gb, we have warehouse configured for reporting and BI.

    Initially, someone has implemented snapshot replication on warehouse earlier, which is failing after running for long time for 2 tables which are big in size like occupying 30% size of total db. Means out of 56 tables just 2 have used more than 60% of total size in db. 

    I have worked on this before but not to replicate whole db to warehouse. 

    I have SSIS and transactional replication in mind. But really keen to know what could work best for us.

    Any help will be appreciated.

     


    Dilip Patil..


    Monday, May 27, 2019 7:03 AM

Answers

  • It really depends. 

    as I understand live database will have different schema to DW or not?

    I would say if there is no transformation in tables and data I would use transactional replication. Transaction replication  will sort out your incremental load in real time and it is easy to maintain (with that small amount of tables). Please note for replication there will be bit overhead with setting up distribution database, overhead depends how busy your live db is and what will be your retention policy on distributor.

    if there are transformation use SSIS. You could potentially do the transformations in replication by customizing replication procedures but that could be quite difficult to mange later on.

    Monday, May 27, 2019 1:11 PM

All replies

  • It really depends. 

    as I understand live database will have different schema to DW or not?

    I would say if there is no transformation in tables and data I would use transactional replication. Transaction replication  will sort out your incremental load in real time and it is easy to maintain (with that small amount of tables). Please note for replication there will be bit overhead with setting up distribution database, overhead depends how busy your live db is and what will be your retention policy on distributor.

    if there are transformation use SSIS. You could potentially do the transformations in replication by customizing replication procedures but that could be quite difficult to mange later on.

    Monday, May 27, 2019 1:11 PM
  • Hi, 

    I just need to get the daily data from live to warehouse, and the schema is same also.

    Thanks just wanted to ensure that the transactional replication will be the best choice here or not. 

    Just one question, will it take much more space for logs?

     


    Dilip Patil..

    Monday, May 27, 2019 1:41 PM
  • What you mean by logs?

    If you referring to transaction log on Publisher, then you need to keep in mind that logreader need to read always entire transaction log before releasing it for checkpoint (or transaction log back- depends what recovery model u on). That means for example during index rebuild you may have transaction log getting big and not able to empty it. Also u need to remember there will be some delay with data delivery as log reader need to read the log and grab all transactions that need to be replicated but that should not be long. In my case transaction log that is 100GB + large is being read within 5 minutes or so.

    Tuesday, May 28, 2019 7:39 AM
  • Thanks for sharing!!

    Well, we have decided to have a DR server between our live and warehouse server.

    we will setup the daily backup from live server and restore to DR server and configure DR as Publisher for the transactional replication and warehouse as subscriber.

    You think this could be an optimal solution, we usually see our transactional log file around 10 to 15 gb daily though.

     

     


    Dilip Patil..

    Tuesday, May 28, 2019 7:58 AM
  • 15 GB of tran log is nothing. you should be fine.

    however I do not exactly understand a reason behind having DR (daily restore) and replication on top of it. 

    Will be there constant data flow between PROD and DR? Or you just will be restoring db on daily basis? If you will be restoring database once a day, you do not need transactional replication as this is dedicated to live data streams and once restored and overwritten in DR transaction replication wont replicate the data that were inserted from last full backup (hope it makes sense to you). Replication works on transaction log whatever is being written to the log it need to be read as it goes and then pick up transactions that are marked for replication and send down to subscribers. In that case that wont work.

    I understand u tried to run snapshot replication and that did not work for u? what errors you had?

    Tuesday, May 28, 2019 8:29 AM
  • Ohh means the replication on db being restored daily wont work.

    Let me explain my problem again.

    We have had snapshot replication currently which is working well. Recently we added one table in replication which is quite big around more than 50% size of total DB i.e. 100 gb+.

    When we ran replication it failed with the error that do not have sufficient resources.  

    Also, we don't want our production server face any issues due to replication. Business don't want anything to run on live server except the apps.

    So, we planned to setup seperate DR server which would have full restored from live every morning, if in case the prod goes down we can redirect the app requests to DR server as backup source. 

    Now, please tell me how can i go ahead with this. which solution will work for me here?

     

     


    Dilip Patil..

    Tuesday, May 28, 2019 9:13 AM