locked
Parsing a fixed format row RRS feed

  • Question

  • I am importing from the IBM i a file into SQL table FIXED. Each row of the file is a fixed format layout. There are about 80 fields. Part of the file layout is below. So.. initially the rows will be inserted into SQL table FIXED. Then I want to use the layout and parse each field from FIXED into table NORMAL which will have columns WKCIF varchar(34), WKSNAM varchar(30), etc. All columns in NORMAL will be varchar, regardless if the layout says A/D for alphanumeric, or D for date or N for numeric. 

    I do not anticipate a large volume of records. But honestly, I do not know yet.

    How is the best and possibly easiest way to get this done? And yes, I know those two things do not go together.

    Field#, Desc, Field, Type, Length
    01, Borrower ID, WKCIF, A/N , 34
    02, Short Name, WKSNAM, A/N , 30
    03, Long Name, WKNAME, A/N , 40
    04, Address line 1, WKADDR1, A/N , 40
    05, Address Line 2, WKADDR2, A/N , 40
      


    • Edited by TheBrenda Friday, March 16, 2018 3:51 PM edit
    Friday, March 16, 2018 3:49 PM

Answers

  • You can use a stage table to import your fixed length file and transform(convert each column to its respective data type) the stage table data and insert them into your target table.
    • Marked as answer by TheBrenda Tuesday, March 27, 2018 4:25 PM
    Friday, March 16, 2018 3:52 PM

All replies

  • You can use a stage table to import your fixed length file and transform(convert each column to its respective data type) the stage table data and insert them into your target table.
    • Marked as answer by TheBrenda Tuesday, March 27, 2018 4:25 PM
    Friday, March 16, 2018 3:52 PM
  • Use SSIS to parse the file into a "stage" table. 
    Friday, March 16, 2018 4:05 PM
  • I plan on doing an import then transform. I am looking for suggestions on how best and easiest to parse the fixed length column into the column specified in the layout. I know that i can do 80 substrings. but worried that might be slow.

    So I have this so far. BUt just cursor through each field for each row and use a substring?

    drop table LAYOUT
    create table LAYOUT (FieldNbr int, FieldDesc varchar(128), FieldName varchar(128), FieldType varchar(10), FieldLength int)
    insert into LAYOUT
    values
    (01, 'Borrower ID', 'WKCIF', 'A/N', 34),
    (02, 'Short Name', 'WKSNAM', 'A/N', 30),
    (03, 'Long Name', 'WKNAME', 'A/N', 40),
    (04, 'Address line 1', 'WKADDR1', 'A/N', 40),
    (05, 'Address Line 2', 'WKADDR2', 'A/N', 40)
    
    drop table FIXED
    create table FIXED (FixedRow varchar(8000))
    insert into FIXED
    select
     'a234567890123456789012345678901234' +
     'ajohnny smith is a very good90' +
     'aLNjohnny smith is a long678901234567890' +
     'aA14567890123456789012345678901234567890' +
     'aA24567890123456789012345678901234567890'
    union
    select
     'b234567890123456789012345678901234' +
     'bjohnny smith is a very good90' +
     'bLNjohnny smith is a long678901234567890' +
     'bA14567890123456789012345678901234567890' +
     'bA24567890123456789012345678901234567890'
     
    select * from FIXED 
    
    drop table NORMAL
    declare @SQL nvarchar(max) = ''
    SELECT @SQL = @SQL + ' ' + Fieldname + ' varchar(100),' 
        FROM LAYOUT ORDER BY FieldNbr
        
    -- remove last commas and add ending ')'
    SET @SQL = 'Create table NORMAL (' + LEFT(@SQL, LEN(@SQL) - 1) + ')'
    
    exec sp_executesql @sql

     
    Friday, March 16, 2018 4:30 PM
  • I cannot use SSIS to parse. We have about 200 clients and deploying new SSIS packages with the credentials in the connection strings is a huge hassle. I already have the process (generic SSIS package) to get the data from the IBM i to the SQL Fixed table. Not I am looking for suggestions in tsql on how to parse from table FIXED into table NORMAL.
    Friday, March 16, 2018 4:41 PM
  • Yeah, the better practice is to load data to the staging table first and then parse the string in SQL. What is the  delimiter in the string? What is your SQL version?

    A Fan of SSIS, SSRS and SSAS


    Friday, March 16, 2018 5:14 PM
  • You can use substring to split all these columns since your have a fixed length. You need to calculate the start and end for each substring function for each column based on your file definition. you can use try_convert/try_cast around each substring to get the correct data type for your data.
    Friday, March 16, 2018 5:53 PM
  • One way to do this is using Regex match

    Based on your column rules, set up regex patterns in a table. Then have a script task inside which do regex match of the file data against the patterns. This will help you to identify the exception which can dealt with according to your requirement like raising exception, redirecting to error file etc


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    Friday, March 16, 2018 6:46 PM
  • Guoxiong - There is no delimeter. It is not a delimited file, it is a fixed length file.

    Jingyang Li - I am using varchar for the NORMAL table, regardless of the type in the written layout. REGEX

    Visakh16 - RegEx? THere is no Regex is SQL. I do not care about exceptions. I am taking the start, length and stuffing it into the NORMAL column. all NORMAL columns will be varchar so I am not doing any error checking. 

    I am just looking for an efficient way to parse. I guess 80 substrings for each row is the way to go?

    Friday, March 16, 2018 7:34 PM
  • When you import fixed length file, you will need to configure to setup length for each column while you are doing it in SSIS (or through SSMS  import/explore wizard). Since you put the whole thing in one fixed length column (char(xxx)). You need to use that many substring function in TSQL for what you didn't do in the first place.
    Friday, March 16, 2018 8:13 PM
  • CREATE TABLE Layout (
    	FieldNbr int, 
    	FieldDesc varchar(128), 
    	FieldName varchar(128), 
    	FieldType varchar(10), 
    	FieldLength int
    );
    INSERT INTO Layout VALUES
    (01, 'Borrower ID', 'WKCIF', 'A/N', 34),
    (02, 'Short Name', 'WKSNAM', 'A/N', 30),
    (03, 'Long Name', 'WKNAME', 'A/N', 40),
    (04, 'Address line 1', 'WKADDR1', 'A/N', 40),
    (05, 'Address Line 2', 'WKADDR2', 'A/N', 40);
    
    CREATE TABLE Fixed (
    	FixedRow varchar(8000)
    );
    INSERT INTO Fixed VALUES
    (
    	'a234567890123456789012345678901234' +
    	'ajohnny smith is a very good90' +
    	'aLNjohnny smith is a long678901234567890' +
    	'aA14567890123456789012345678901234567890' +
    	'aA24567890123456789012345678901234567890'
    ),
    (
    	'b234567890123456789012345678901234' +
    	'bjohnny smith is a very good90' +
    	'bLNjohnny smith is a long678901234567890' +
    	'bA14567890123456789012345678901234567890' +
    	'bA24567890123456789012345678901234567890'
    );
    
    DECLARE @sql nvarchar(max);
    DECLARE @sqlCreate nvarchar(max);
    DECLARE @sqlSelect nvarchar(max);
    DECLARE @sqlInsert nvarchar(max);
    DECLARE @fieldNameList varchar(max);
    DECLARE @fieldLength int;
    DECLARE @fieldName varchar(128);
    DECLARE @index int;
    DECLARE @maxRowNumber int;
    DECLARE @fieldLengthSubTotal int;
    
    SET @fieldNameList = (SELECT DISTINCT STUFF((SELECT ',' + FieldName + ' varchar(' + CAST(FieldLength AS varchar(4)) + ')' FROM Layout ORDER BY FieldNbr FOR XML PATH('')), 1, 1, '') FROM Layout);
    SET @sqlCreate = 'CREATE TABLE NORMAL(' + @fieldNameList + ');'
    SET @index = 1;
    SELECT @maxRowNumber = MAX(FieldNbr) FROM Layout
    SET @fieldLength = 0;
    SET @fieldLengthSubTotal = 1;
    SET @sqlSelect = 'SELECT ';
    SET @sqlInsert = 'INSERT INTO NORMAL ';
    
    WHILE @index <= @maxRowNumber
    BEGIN
    	SELECT @fieldName = FieldName, @fieldLength = FieldLength FROM Layout WHERE FieldNbr = @index;
    	
    	IF @index = @maxRowNumber
    	BEGIN
    		SET @sqlSelect = @sqlSelect + 'SUBSTRING(FixedRow, ' + CAST((@fieldLengthSubTotal) AS varchar(4)) +', ' + CAST(@fieldLength AS varchar(4)) + ')'
    	END
    	ELSE
    	BEGIN
    		SET @sqlSelect = @sqlSelect + 'SUBSTRING(FixedRow, ' + CAST((@fieldLengthSubTotal) AS varchar(4)) +', ' + CAST(@fieldLength AS varchar(4)) + '),'
    	END
    
    	SET @fieldLengthSubTotal = @fieldLengthSubTotal + @fieldLength;
    	SET @index = @index + 1;
    END
    
    SET @sqlSelect = @sqlSelect + ' FROM Fixed';
    
    SET @sql = @sqlCreate + ' ';
    SET @sql = @sql + @sqlInsert + ' ';
    SET @sql = @sql + @sqlSelect;
    
    EXEC sp_executesql @sql;
    
    SELECT * FROM NORMAL;
    
    /*
    DROP TABLE IF EXISTS Layout;
    DROP TABLE IF EXISTS Fixed;
    DROP TABLE IF EXISTS NORMAL;
    */


    A Fan of SSIS, SSRS and SSAS

    Friday, March 16, 2018 9:41 PM
  • I was making it much harder than it had to be. Also, I was looking for something clever that would be self maintaining for when the Layout changed. But just creating a transactional view with the SUBSTRINGS and then inserting from the transactional view into the ods table is the easiest and quickest way to handle this fixed length file. And it is more consistent with how we process other files from the IBM i. I could get fancy, and use the LAYOUT table to build dynamic SQL to create the view but when there is a change the Layout table would have to change and I do not see any time saving to changing a Table versus directly changing the Tran_Fixed view. 

    CREATE VIEW Tran_Fixed AS 
        SELECT SUBSTRING(FixedRow, 1, 34) as "Borrower ID",
          SUBSTRING(FixedRow, 35, 30) as "Short Name",
          SUBSTRING(FixedRow, 65, 40) as "Long Name",
          SUBSTRING(FixedRow, 105, 40) as "Address Line 1",
          SUBSTRING(FixedRow, 105, 40) as "Address Line 2"
        FROM Fixed

    Tuesday, March 27, 2018 4:25 PM
  • There are two aspects here:

    1. The efficiency of the code.

    2 The elegancy and reuse-ability of the code.

    If you opt for 2, then 1 will likely suffer. Your above suggestion, the dead-simple many-substring version is the one I was thinking while reading through this thread.

    In the end, you need to return something. In SQL, you can return a value (scalar function, like SUBSTRING) or a table (what SELECT returns, or a table function).

    One could argue for a table function implemented in CLR, but since the schema (number of columns, length and type) is fixed for such, I doubt that it would be convenient for you. Any type of dynamic SQL will likely peform worse, and methink that the ability to adapt for when yo get more columns will not outweigh the readability for the many-substring version you posted. There's not margin for errors, of course (like one row doesn't meet up to how it should look like), but I have a feeling that you are very aware of that.


    Tibor Karaszi, SQL Server MVP (Web Blog)

    Tuesday, March 27, 2018 4:47 PM