Answered complex source text file

  • Friday, April 27, 2012 10:02 AM
     
     

    Hi All

    we are getting text file as source and we need to parse it and load it in a table .  the data inside a text file looks as below . Please suggest the best way to load the data in to table . the below contains 2 records like this we will have many records in a file .

    Td.Name = MI_UMTS_FTP
    Tc.Name = FTP_UMTS_5M
    Tp.Name = Ftp_Rating_1.5
    TestScenario.Name = Marais_MI_UMTS_FTP
    TcRun.Id = 61950500
    TcRun.StartDate = 2012/04/24 15:31:59
    TcRun.EndDate = 2012/04/24 15:36:13
    TcRun.Result = Passed
    Error.Step =
    Error.Protocol =
    O.NeId = PORT Marais
    O.Port.Code = xxxxxxxxxx
    O.Sim.MsisdnVoice = 9999999999999999
    Host.Address = xxxxxxxxxxxx
    O.Msc.Name = Prox.SGAM1
    O.Region.Name = BRUXELLES
    ErrorCause = NoError
    O.App.ThrputKbitRxAvg = 351
    O.AttachSuccessful = Yes
    O.AttachTime = 1.64
    O.DetachResult = Yes
    O.DetachTime = 2.688
    O.PdpCntx.ActTime = 2.172
    O.PdpCntx.Successful = Yes
    Result = PASSED

    Td.Name = Voice_qual
    Tc.Name = VoicePesq_2.0_2G
    Tp.Name = VoicePesq_2.0
    TestScenario.Name = Wavre_Leuven_Voice_qual
    TcRun.Id = 61950604
    TcRun.StartDate = 2012/04/24 15:33:55
    TcRun.EndDate = 2012/04/24 15:36:21
    TcRun.Result = Passed
    Error.Step =
    Error.Protocol =
    O.NeId = PORT Wavre_VQM
    T.NeId = PORT Leuven_VQM
    O.Port.Code = RTUPC01148-02
    T.Port.Code = RTUPC01017-02
    T.Region.Name = LEUVEN
    T.RingNumber = 2
    O.Msc.Name = Prox.WAVRE
    O.Region.Name = WAVRE
    ErrorCause = NoError
    O.AttachSuccessful = Yes
    O.AudioCodec = AMR_FR
    O.DetachSuccessful = Yes
    OT.AnswerTimeStamp = 90.887
    OT.CallSetupSuccessful = Yes
    OT.CallSetupTime = 7.781
    OT.CallSetupTimeUnderThrld = Yes
    OT.DialStartTimeStamp = 76.406
    OT.HangupTimeStamp = 138.734
    OT.PesqScoreAvg = 3.67
    OT.PesqScoreMax = 3.82
    OT.PesqScoreMin = 3.54
    OT.PesqUnderThrldPerc = 0
    OT.PlayErrorPerc = 0
    OT.QoSEvaluateErrorPerc = 0
    OT.RingDuration = 5.999
    Result = PASSED
    T.AttachSuccessful = Yes
    T.AudioCodec = AMR_FR
    T.Cli.CheckSuccessful = Yes
    T.DetachSuccessful = Yes
    T.RingSuccessful = Yes
    TO.PesqScoreAvg = 3.63
    TO.PesqScoreMax = 3.82
    TO.PesqScoreMin = 3.51
    TO.PesqUnderThrldPerc = 0
    TO.PlayErrorPerc = 0
    TO.QoSEvaluateErrorPerc = 0


    Surendra Thota


All Replies

  • Friday, April 27, 2012 10:59 AM
     
     Proposed
    I would read it all in as one column, use a script component to split the data into two columns using the "=" and to assign a recordID to the rows. After that, I would do a pivot on that newly added recordID to get a decent column structure.

    MCTS, MCITP - Please mark posts as answered where appropriate.

  • Friday, April 27, 2012 11:15 AM
    Moderator
     
     Proposed

    I had something simular, where I used a Script Component:http://microsoft-ssis.blogspot.com/2011/02/script-component-as-source-2.html

    The difficult thing about your file is that the number of columns differ per record. Is that correct? What is the record delimiter in your case?


    Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com | Twitter

  • Friday, April 27, 2012 12:53 PM
     
     
    Yes the number of colmumns are not fixed . but we know the max number of fields we get .  the record delimeter is 2 new lines

    Surendra Thota


  • Friday, April 27, 2012 12:56 PM
     
     

    I didn't notice the dynamic number of columns - I have better things to do than count them all - so I would just script it all out with a .NET script task.

    Create a variable for each possible column and give them all default values.
    Take in a row, parse the text before the = and determine which column it is. Put the value in the right variable.
    Read the next row and so on until you've reached a new record. I do hope Td.Name is always your first column.
    When you reach the next record, put all variables in one row and add it to the output.

    Start again with the next record until you reach the end of the file.


    MCTS, MCITP - Please mark posts as answered where appropriate.

  • Friday, April 27, 2012 1:09 PM
     
     

    go with the script solution you have more control

    check this link , but remeber that the link does NOT have your answer but it gives you good idea what to do.

    it has 3 pages read them all.


    Sincerely SH -- MCITP , MCTS -- Please kindly mark the post(s) that answered your question and/or vote for the post(s).

  • Friday, April 27, 2012 1:12 PM
     
     

    go with the script solution you have more control

    check this link , but remeber that the link does NOT have your answer but it gives you good idea what to do.

    it has 3 pages read them all.

    That's quite a nice article, thanks for the reference.

    MCTS, MCITP - Please mark posts as answered where appropriate.

  • Friday, April 27, 2012 2:19 PM
     
     

    Hi

    In one my project i have handled this type of files Colmun1:column2  as a row , but now how to identify the end of row . and how to identify the endof the file.

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    Dim strRow As String

    Dim strColSeperator As String

    Dim rowValues As String()

    strRow = Row.line.ToString()

    If strRow.Contains(";") Then

    strColSeperator = (

    ";")

    ElseIf strRow.Contains(":") Then

    strColSeperator =

    ":"

    End If

    rowValues = Row.line.Split(

    CChar(strColSeperator))

    Row.CounterID = rowValues.GetValue(0).ToString()

    Row.CounterValue = rowValues.GetValue(1).ToString()

    Row.FileName = FileName.ToString()

    Row.ServerName = ServerName.ToString()

    Row.Year = cYear.ToString()

    Row.Month = cMonth.ToString()

    Row.Date = colDate.ToString()

    Row.Hour = cHour.ToString()

    Row.Min = cMin.ToString()

    End Sub


    Surendra Thota

  • Friday, April 27, 2012 2:21 PM
    Moderator
     
     
    look for the empty row.... where the length of the array is 0 or before the split where the length of Row.line is 0.

    Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com | Twitter


  • Monday, April 30, 2012 10:18 AM
     
     
    How can we split each row in Faltfile connection manager . when we connect to the file it is getting each column as row . the problem here is we have multiple row with 2 blank  rows as delimeter . please let me know if anyone handled these type of files .

    Surendra Thota

  • Tuesday, May 01, 2012 12:56 PM
     
     
    Yes the number of colmumns are not fixed . but we know the max number of fields we get .  the record delimeter is 2 new lines

    Surendra Thota


    And can we say that all records have the field "Td.Name"?

    and can we say that each records first field is "Td.Name"?

    If so i have have a way all i need is a sample file ( if you like you can email (SNikkhah@live.ca) me a sample file, please remove confidential information)


    Sincerely SH -- MCITP , MCTS -- Please kindly mark the post(s) that answered your question and/or vote for the post(s).

  • Tuesday, May 01, 2012 3:48 PM
     
     Proposed Has Code

    I HAVE YOUR ANSWER

    assuming you have data like ....

    --------------------

    Td.Name = MI_UMTS_FTP
     Tc.Name = FTP_UMTS_5M
     TcRun.EndDate = 2012/04/24 15:36:13
     Error.Step =
    Error.Protocol =
     O.PdpCntx.ActTime = 2.172
     O.PdpCntx.Successful = Yes
     Result = PASSED
     
    Td.Name = Voice_qual
     O.PdpCntx.ActTime = 2.172
     TcRun.Id = 61950604
     TcRun.EndDate = 2012/04/24 15:36:21
     TcRun.Result = Passed
     Error.Step =
     O.AudioCodec = AMR_FR
     ---------------------

    make a DFT and make the mentioned objects like ....

    In the first DER you simply split the data from the data column name, and in theseconf DFT you simply RLTrim off space

    FieldColumnName = FINDSTRING([Column 0],"=",1) < 3 ? "" : SUBSTRING([Column 0],1,(FINDSTRING([Column 0],"=",1) - 1))

    DataValue = FINDSTRING([Column 0],"=",1) < 3 ? "" : SUBSTRING([Column 0],(FINDSTRING([Column 0],"=",1) + 1),LEN([Column 0]))

    set the script to have string outputs and use the mentioned code

    ' Microsoft SQL Server Integration Services Script Component
    ' Write scripts using Microsoft Visual Basic 2008.
    ' ScriptMain is the entry point class of the script.
    
    Imports System
    Imports System.Data
    Imports System.Math
    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
    
    
    <Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
    <CLSCompliant(False)> _
    Public Class ScriptMain
        Inherits UserComponent
    
        Dim StartFlag As Integer = 0
        Dim RowCount As Integer = 0
    
        Dim _strDataValue As String = ""
        Dim _strColName As String = ""
    
        Dim _TdName As String = ""
        Dim _TcName As String = ""
        Dim _TcRunEndDate As String = ""
        Dim _ErrorStep As String = ""
        Dim _ErrorProtocol As String = ""
        Dim _OPdpCntxActTime As String = ""
        Dim _OPdpCntxSuccessful As String = ""
        Dim _Result As String = ""
        Dim _TcRunId As String = ""
        Dim _TcRunResult As String = ""
        Dim _OAudioCodec As String = ""
    
    
        Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
            '
            ' Add your code here
            '
            RowCount = RowCount + 1
    
            Dim strDataValue As String
            Dim strColName As String
    
            strColName = Row.FieldColumnName.ToString
            strDataValue = Row.DataValue.ToString
    
            'If strColName = "Td.Name" And StartFlag = 1 Then
            If strColName = "" And strDataValue = "" And StartFlag = 1 Then
    
                Row.TdName = _TdName.ToString
                Row.TcName = _TcName.ToString
                Row.TcRunEndDate = _TcRunEndDate 'TcRun.EndDate
                Row.ErrorStep = _ErrorStep ' Error.Step
                Row.ErrorProtocol = _ErrorProtocol ' Error.Protocol
                Row.OPdpCntxActTime = _OPdpCntxActTime  'O.PdpCntx.ActTime
                Row.OPdpCntxSuccessful = _OPdpCntxSuccessful  'O.PdpCntx.Successful
                Row.Result = _Result ' Result
                Row.TcRunId = _TcRunId 'TcRun.Id
                Row.TcRunResult = _TcRunResult 'TcRun.Result
                Row.OAudioCodec = _OAudioCodec 'O.AudioCodec
    
    
                StartFlag = 0
    
                _TdName = ""
                _TcName = ""
                _TcRunEndDate = ""
                _ErrorStep = ""
                _ErrorProtocol = ""
                _OPdpCntxActTime = ""
                _OPdpCntxSuccessful = ""
                _Result = ""
                _TcRunId = ""
                _TcRunResult = ""
                _OAudioCodec = ""
    
                Row.DirectRowToMainOutputRecordsBuffer()
            End If
    
            If strColName = "Td.Name" Then
                _TdName = strDataValue
            ElseIf strColName = "Tc.Name" Then
                _TcName = strDataValue
            ElseIf strColName = "TcRun.EndDate" Then
                _TcRunEndDate = strDataValue
            ElseIf strColName = "Error.Step" Then
                _ErrorStep = strDataValue
            ElseIf strColName = "Error.Protocol" Then
                _ErrorProtocol = strDataValue
            ElseIf strColName = "O.PdpCntx.ActTime" Then
                _OPdpCntxActTime = strDataValue
            ElseIf strColName = "O.PdpCntx.Successful" Then
                _OPdpCntxSuccessful = strDataValue
            ElseIf strColName = "Result" Then
                _Result = strDataValue
            ElseIf strColName = "TcRun.Id" Then
                _TcRunId = strDataValue
            ElseIf strColName = "TcRun.Result" Then
                _TcRunResult = strDataValue
            ElseIf strColName = "O.AudioCodec" Then
                _OAudioCodec = strDataValue
            End If
    
    
            StartFlag = 1
    
        End Sub
    
    
    End Class
    

    your final result will look like


    Sincerely SH -- MCITP , MCTS -- Please kindly mark the post(s) that answered your question and/or vote for the post(s).

  • Tuesday, May 01, 2012 3:49 PM
     
     

    You may have to change the IF part of the code

    If strColName = "" And strDataValue = "" And StartFlag = 1 Then

    depending on your needs

    I can email you the package if you like

    good luck


    Sincerely SH -- MCITP , MCTS -- Please kindly mark the post(s) that answered your question and/or vote for the post(s).


  • Wednesday, May 02, 2012 7:57 AM
     
     Answered

    Hi All

    i came up with this script when i debug it is working fine i debug at each location and everything is giving perfect value . but when i add dataviewer after the script component it is throwing only null values . Is there any problem in assigning the Bufferedcolumns ?

    Imports System
    Imports System.Data
    Imports System.Math
    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

    <Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
    <CLSCompliant(False)> _
    Public Class ScriptMain
        Inherits UserComponent
        Dim sr As System.IO.StreamReader
        Dim FileProcessingNumber As Long
        Dim TcRunEnddate As String
        Dim FilePath As String
        Dim test As String

     

     

        Public Overrides Sub PreExecute()
            MyBase.PreExecute()
            FilePath = Variables.FileName
            sr = New System.IO.StreamReader(FilePath)
            FileProcessingNumber = Variables.FileProgressiveNumber
            TcRunEnddate = Variables.TcRunEndDate
        End Sub

        Public Overrides Sub PostExecute()
            MyBase.PostExecute()
            sr.Close()
        End Sub

        Public Overrides Sub CreateNewOutputRows()
            Dim lineIndex As Integer = 0
            Dim EmptyRow As Integer = 0
            Output0Buffer.AddRow()
            While (Not sr.EndOfStream)
                'Output0Buffer.AddRow()

                Dim line As String = sr.ReadLine()


                If (line.Length = 0) Then
                    EmptyRow = EmptyRow + 1
                    If (EmptyRow = 2) Then
                        Output0Buffer.AddRow()
                        EmptyRow = 0
                        'MsgBox("new row")
                        GoTo line1

                    End If
                End If

                Dim columnArray As String() = line.Split(Convert.ToChar("="))
                If (columnArray(0).Length > 0) Then

                    Output0Buffer.tdname = Nothing
                    Output0Buffer.tcname = Nothing
                    Output0Buffer.testscenarioname = Nothing
                    Output0Buffer.tcrunid = Nothing
                    Output0Buffer.tcrunstartdate = Nothing
                    Output0Buffer.tcrunresult = Nothing
                    Output0Buffer.errorstep = Nothing
                    Output0Buffer.errorprotocol = Nothing
                    Output0Buffer.oneid = Nothing
                    Output0Buffer.tneid = Nothing
                    Output0Buffer.oportcode = Nothing
                    Output0Buffer.tportcode = Nothing
                    Output0Buffer.oregionname = Nothing
                    Output0Buffer.osimmsisdnvoice = Nothing
                    Output0Buffer.tsimmsisdnvoice = Nothing
                    Output0Buffer.server = Nothing
                    Output0Buffer.errorcause = Nothing
                    Output0Buffer.errorcausefailurereason = Nothing
                    Output0Buffer.errorcausephasename = Nothing
                    Output0Buffer.errorcauserole = Nothing
                    Output0Buffer.errorcausesteperror = Nothing
                    Output0Buffer.errorcauseuserdefined = Nothing
                    Output0Buffer.tsmeas = Nothing
                    Output0Buffer.tcrunenddate = Nothing
                    Output0Buffer.FileProgressiveNumber = Nothing


                    If (columnArray(0).ToString = "Td.Name ") Then
                        Output0Buffer.tdname = columnArray(1)
                    ElseIf (columnArray(0) = "Tc.Name ") Then
                        Output0Buffer.tcname = columnArray(1)
                    ElseIf (columnArray(0) = "TestScenario.Name ") Then
                        Output0Buffer.testscenarioname = columnArray(1)
                    ElseIf (columnArray(0) = "TcRun.Id ") Then
                        Output0Buffer.tcrunid = columnArray(1).Replace(" ", "0")
                    ElseIf (columnArray(0) = "TcRun.StartDate ") Then
                        Output0Buffer.tcrunstartdate = columnArray(1)
                    ElseIf (columnArray(0) = "TcRun.Result ") Then
                        Output0Buffer.tcrunresult = columnArray(1)
                    ElseIf (columnArray(0) = "Error.Step ") Then
                        Output0Buffer.errorstep = columnArray(1)
                    ElseIf (columnArray(0) = "Error.Protocol ") Then
                        Output0Buffer.errorprotocol = columnArray(1)
                    ElseIf (columnArray(0) = "O.NeId ") Then
                        Output0Buffer.oneid = columnArray(1)
                    ElseIf (columnArray(0) = "O.Port.Code ") Then
                        Output0Buffer.oportcode = columnArray(1)
                    ElseIf (columnArray(0) = "O.Sim.MsisdnVoice ") Then
                        Output0Buffer.osimmsisdnvoice = columnArray(1)
                    ElseIf (columnArray(0) = "T.NeId ") Then
                        Output0Buffer.tneid = columnArray(1)
                    ElseIf (columnArray(0) = "T.Port.Code ") Then
                        Output0Buffer.tportcode = columnArray(1)
                    ElseIf (columnArray(0) = "O.Region.Name ") Then
                        Output0Buffer.oregionname = columnArray(1)
                    ElseIf (columnArray(0) = "T.Sim.MsisdnVoice ") Then
                        Output0Buffer.tsimmsisdnvoice = columnArray(1)
                    ElseIf (columnArray(0) = "Server ") Then
                        Output0Buffer.server = columnArray(1)
                    ElseIf (columnArray(0) = "ErrorCause ") Then
                        Output0Buffer.errorcause = columnArray(1)
                    ElseIf (columnArray(0) = "ErrorCause.FailureReason ") Then
                        Output0Buffer.errorcausefailurereason = columnArray(1)
                    ElseIf (columnArray(0) = "ErrorCause.PhaseName ") Then
                        Output0Buffer.errorcausephasename = columnArray(1)
                    ElseIf (columnArray(0) = "ErrorCause.Role ") Then
                        Output0Buffer.errorcauserole = columnArray(1)
                    ElseIf (columnArray(0) = "ErrorCause.StepError ") Then
                        Output0Buffer.errorcausesteperror = columnArray(1)
                    ElseIf (columnArray(0) = "ErrorCause.UserDefinedErrorCause ") Then
                        Output0Buffer.errorcauseuserdefined = columnArray(1)
                    ElseIf (columnArray(0) = "TcRun.EndDate ") Then
                        Output0Buffer.tcrunenddate = columnArray(1)
                    End If


                    'Output0Buffer.tsmeas = TcRunEnddate
                    Output0Buffer.FileProgressiveNumber = FileProcessingNumber


                    End If

    line1:          lineIndex = lineIndex + 1
            End While

        End Sub

    End Class


    Surendra Thota

  • Wednesday, May 02, 2012 10:45 AM
     
     

    Hi All

    code is working perfectly after removing the Output0Buffer.tdname = Nothing . can anybody tell why = nothing is giving no value even after we assigning a value Output0Buffer.tdname = columnArray(1)

                                 


    Surendra Thota

  • Thursday, May 03, 2012 10:55 AM
     
     

    Hi All

    how to put default value to a column . as of now if there is no value to a column it is giving null . but i need to put some default value . when i tried assign a value before the process it is giving only that default value even it has some value . what i understand is Output0Buffer value can be assigned only once . we cant overwrite it .


    Surendra Thota