locked
Fetch data from 3rd party database to data warehouse RRS feed

  • Question

  • HI

    I am not sure if I am asking this question in correct forum if not please pardon me and guide me to right forum.

    Our data warehouse needs to fetch data from different clients (3rd party). currently they are delivering CSV files and then the SSIS package populate our data warehouse. This is very time comsuming and not reliable process as sometime they deliver incorrect data and other times we have to wait for a long time untill someone uploads the data to our sftp server.

    Keeping in mind the data is very sensitive, can you guys please advise me different options to access this data securely and timely.

    Your help is much appreciated.

    Regards

    Malik

    Thursday, August 8, 2013 1:29 PM

Answers

  • As a consumer of 3rd party data your SLA should state when/where and what time. It is always best to have control and have your own process check when they drop the file in a folder on their system. If you are suppose to get it 5:30am and it is not there you continue to check every time interval you deem appropriate. You don't want to wait on there sftp process to send.

    Bad data will always be a problem early on and a headache. The source system will always be slow, most of the time, to make a code or the data is dirty anyway. Most could have been caught in testing and QA but it is a fact things happen. The best you can do is use your SLA agreement to get the source to change and be responsive on your consuming end to automate the cleansing.

    Since you know you have bad data you have a good database design, now you need to code the automation to fix it the second time it happens.


    Hope this is Helpfull. J Alverson Microsoft BI Solutions Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.


    • Edited by J Alverson Friday, August 9, 2013 10:59 PM
    • Proposed as answer by Mike Yin Monday, August 19, 2013 2:54 PM
    • Marked as answer by Mike Yin Monday, August 26, 2013 2:51 PM
    Friday, August 9, 2013 10:56 PM

All replies

  • Are the different clients (3rd party) data are hosted on some sort of database ?

    are the database can be send as XML directly in your Database ?

    or any other details will be helpful..


    Thanks and Regards, Prajesh Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Thursday, August 8, 2013 3:33 PM
  • The 3rd party databases are different depending on each client and it is not possible to get XML from the clients.

    Friday, August 9, 2013 9:59 AM
  • Do you have rights to connect to these databases? if yes you can create SSIS data flow to get the data directly from these 3rd party Dbs and do data validation on your staging tables and only correct and validated (and new/modified) data should move forward to your data mart.

    Thanks and Regards, Prajesh Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Friday, August 9, 2013 10:22 AM
  • Thats the problem. We do not have direct access rights otherwise we could directly fetch data from their servers. Because of security reasons they do not want to give us direct access.

    we are thinking about two approaches:

    1. ssis package runs on their side which gets the data and export it to csv files and upload csv files to a secure ftp server automatically. It will eliminate the human interaction from their side.

    2. set up a slave database server outside their internal domain with seperate subnet secured by firewalls and allow only specific IPs to connect with it.

    For both these options (preferably option 2) we need to see what security measures should be taken in order to keep it secure and stable. I want to know the industry security standards for such kind of setup so we can communicate with them accordingly.

    Friday, August 9, 2013 12:30 PM
  • As a consumer of 3rd party data your SLA should state when/where and what time. It is always best to have control and have your own process check when they drop the file in a folder on their system. If you are suppose to get it 5:30am and it is not there you continue to check every time interval you deem appropriate. You don't want to wait on there sftp process to send.

    Bad data will always be a problem early on and a headache. The source system will always be slow, most of the time, to make a code or the data is dirty anyway. Most could have been caught in testing and QA but it is a fact things happen. The best you can do is use your SLA agreement to get the source to change and be responsive on your consuming end to automate the cleansing.

    Since you know you have bad data you have a good database design, now you need to code the automation to fix it the second time it happens.


    Hope this is Helpfull. J Alverson Microsoft BI Solutions Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.


    • Edited by J Alverson Friday, August 9, 2013 10:59 PM
    • Proposed as answer by Mike Yin Monday, August 19, 2013 2:54 PM
    • Marked as answer by Mike Yin Monday, August 26, 2013 2:51 PM
    Friday, August 9, 2013 10:56 PM