locked
How to design a very simple database/DW with 2 very large logging tables and automatically continuously off load data to another server? RRS feed

  • Question

  • I have a client asking me to help them with a performance issue they are currently having on a production system.  They have a whole bunch of devices in the field that connect to their db server intermittently throughout the day and write logging information.  They basically write to 2 tables that have grown to a total of ~1.8TB of data.  They need to keep all this data for reporting/analytics, but the issue they are running into is they are getting timeout issues because the DB has gotten so bloated and their field devices are unable to send their information.  On the flip side the reporting/analytics folks are complaining saying their month end reporting query runs for like 8 hours. The db has ~12 or tables but 99.9% of the data is in these two logging tables.  Again, they don't want to get rid of any data and expect it to continue growing daily.  They have 3-5 indexes on each of these large tables and are clustering by date.  Otherwise they have a few Fky's in the other tables but those don't change much.  The devices are the only write activities on the large tables and everything else is mainly read only.  Priority 1 for them is the devices have to be able to get their data home, after that is they would like for the ability to be able to report on it as quickly as possible.

    My plan is to stand up a reporting/DW SQL instance and continuously move the data from the production server except for the most 2 recent months over to the reporting server.  On the reporting server side, I plan to partition these large tables by date and then look at index tuning on the reporting side as well.  My question is what automated tools/methods to use to continuously offload this data from the production server to reporting?  I want to think of the Production server as basically a catchers mitt to receive the data from all the devices out in the field.


    Thursday, October 12, 2017 1:23 PM

Answers

  • Thanks for the replies.  They have a 1.8TB single data file for this db currently.  Querying, maintenance and disk space management is an absolute pain to say the least.  I am wanting to implement the partitioning scheme to help alleviate some of that headache.  As for the current vs historical side.  The current is basically just a collector for the incoming data from the devices in the field.  They are basically sending logging information, so it is nearly 100% write activity and very little read.  I need to have that data collector DB on and available at all times.  Let's just say their error handling and ability to re-send data when it is unavailable/offline is a bit lacking to say the least!  There also is not much I can do about that due to budget, resource constraints, etc. 

    Does anyone know of any SSIS examples doing something similar they can point me to?  Moving data from point A to point B is not a problem, but I def don't want to lose any data by deleting from point A (prod) thinking I already had it moved over into point B (reporting) and keeping that process working and automated as much as possible.

    Update:

    I did find this online:  https://stackoverflow.com/questions/16234782/move-records-which-are-older-than-3-years-from-one-database-production-db-to-a

    Thursday, October 12, 2017 4:39 PM

All replies

  • Use a SSIS package and set it on SQL Agent running every hour (or whenever). The devices need to query all the 1.8 TB? If not, migrate a bunch of data in your Reporting Server (or DW) and leave in production only the data they need. However your main task is the reporting which take 8 hours. Consolidate the table in your DW creating a Data Mart ad hoc, also for this use SSIS.

    Please mark as answer if this post helped you

    Thursday, October 12, 2017 1:50 PM
  • If they only use 2 months of data normally, but want to keep all the data, the simplest answer is to create  "current" and "historical" tables.  The "current" table could contain, for example the current 6 months (just to be safe) and everything else is moved to the "historical" table (on a different filegroup).  Create an SSIS or stored proc to move the outdated data from current to historical and run it once a month.

    Then if you need to actually combine the tables, you can create views like "table_all" which UNION the 2 tables together.

    That should be a minor change to the overall architecture and allow the system to continue as is with minimal changes.

    Thursday, October 12, 2017 2:07 PM
    Answerer
  • Thanks for the replies.  They have a 1.8TB single data file for this db currently.  Querying, maintenance and disk space management is an absolute pain to say the least.  I am wanting to implement the partitioning scheme to help alleviate some of that headache.  As for the current vs historical side.  The current is basically just a collector for the incoming data from the devices in the field.  They are basically sending logging information, so it is nearly 100% write activity and very little read.  I need to have that data collector DB on and available at all times.  Let's just say their error handling and ability to re-send data when it is unavailable/offline is a bit lacking to say the least!  There also is not much I can do about that due to budget, resource constraints, etc. 

    Does anyone know of any SSIS examples doing something similar they can point me to?  Moving data from point A to point B is not a problem, but I def don't want to lose any data by deleting from point A (prod) thinking I already had it moved over into point B (reporting) and keeping that process working and automated as much as possible.

    Update:

    I did find this online:  https://stackoverflow.com/questions/16234782/move-records-which-are-older-than-3-years-from-one-database-production-db-to-a

    Thursday, October 12, 2017 4:39 PM