Saturday, February 16, 2013 11:15 AM
I have got an requirement to load employee data from multiple sources like CSV, flat file, excel ( each file will have different columns & unique structure, but contains employee grade information).. My task is to aggregate grade information from multiple source files for all employees and store it in the final sql employee grade aggregated table.. Will it possible to accomplish this task in single ssis 2008 package with single data flow task.. if it so how the connections for these source files are made at run time or else can we do it in muliple data flow task..please advise..
Thanks in advance.
Saturday, February 16, 2013 11:42 AM
"each file will have different columns & unique structure, but contains employee grade information."
with above mentioned condition - manually doing setting column mapping might consume time,try with BULK INSERT/OPENROWSET
BULK INSERT dbo.ImportTest FROM 'C:\Textfile.txt' WITH ( FIELDTERMINATOR =',', FIRSTROW = 2 )
INSERT INTO dbo.ImportTest SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=C:\ImportData.xls', [Sheet1$])
If all files are in same structure ,using FOR EACH LOOP container,we can eaily loop through the files.
Thanks & Regards, sathya
- Edited by SathyanarrayananS Saturday, February 16, 2013 11:52 AM
Saturday, February 16, 2013 11:52 AM
Thanks Sathya,But in our project, we need to load only the aggregated grade data for each employee from multiple data sources.. Moreover we need to vaidate the data from each sources and provide error messages before doing aggregation..Is this possible in having bulk insert task within one data flow task of FOR EACH loop container which loops all files in source folder.. Or do i need to have multiple data flow task to achieve this. Please adviseThanks againbalamurugan.r
Saturday, February 16, 2013 11:56 AM
If you need to handle the flow i.e,if 1 insert fails rollback entire transaction,if this is the case put all BULK INSERT/OPENROWSET in one EXECUTE SQL Task with COMMIT/ROLLBACK statement.
or create stored procedure like below and call the stored procedure using EXECUTE SQL Task
CREATE PROCEDURE bulkinert_task --EXEC bulkinert_task AS BEGIN BEGIN TRY BEGIN TRANSACTION BULK INSERT dbo.ImportTest FROM 'E:\blackbox.txt' WITH ( FIELDTERMINATOR =',', FIRSTROW = 2 ) COMMIT TRANSACTION END TRY BEGIN CATCH IF (@@TRANCOUNT > 0) ROLLBACK TRAN END CATCH END
try like below example to handle error, if your importing using SOURCE & DESTINATION task - http://sathyadb.blogspot.in/2012/09/sql-server-integration-services-error.html
Thanks & Regards, sathya
- Edited by SathyanarrayananS Saturday, February 16, 2013 12:20 PM
Saturday, February 16, 2013 12:14 PM
But how can i validate & provide custom error messages for all the source data before doing aggregation... Normally we will add some dot net code in script component to validate all the source data in pipeline within data flow task.. How this can be achievable in bulk insert for all source files...
In addition to that , We don't have unique employee id in all source files so we planned to use fuzzy look up to retrieve employee data from different sources.
for instance..employee E1 will have name like 'Andrew' in source file 1 and 'Mr Andrew' in source file 2.. etc..
- Edited by Bala Murugan R Saturday, February 16, 2013 12:16 PM
Saturday, February 16, 2013 3:59 PM
Hi Bala - I would consider breaking the SSIS package into multiple parts. It sounds like you are having to perform several distinct tasks with the data, so I would break things into chunks related to the tasks.
1) Gather information from multiple file sources into a single dataset
- Have a separate dataflow task for each source, write the data to a single (or multiple) staging tables
2) Perform validation and assign keys
- Use either T-SQL or dataflow tasks (or a combination of both) to perform the specific underlying operation
- Write a SQL query against your stage tables to aggregate the data. Use this as the source in a dataflow task to insert into your destination table.
In all likelihood, yes, you could accomplish this through a single dataflow task, but it would be horrific trying to support(in my opinion).
- Proposed As Answer by Kostya Khomyakov Saturday, February 16, 2013 10:01 PM
Sunday, February 17, 2013 3:37 AM
Thanks a lot for your information.
So i planned to go with multiple data flow task for each source file.. Could you pleaase cofirm my understanding..
Sunday, February 17, 2013 4:36 AM
Glad to hear it was helpful. Without knowing the details, I'd say everything looks good from a high level - assuming there are 4 file types that would need to be processed.
The only improvement I can see would be dependent on the source files. If you are able to determine the file type based on file name/extension and you are processing LOTS of data, you might want to have 4 separate for-each containers to take advantage of parallelism (ie Container 1 processes any CSV files, Container 2 processes tab delimited, Container 3 processes Excel, etc).
Hope the rest of the development goes smoothly for you.
- Marked As Answer by Bala Murugan R Monday, February 18, 2013 3:24 AM