locked
Read large fixed width text file into DataTable vb.net RRS feed

  • Question

  • I need to read large text file (15 million records) which has fixed width data and store in Data Table.

    I am currently reading per line which takes 1+ hour to process data so need to read bulk records.

    Can't do this in SQL need VB.NET solution.


    Thanks, AT

    Monday, November 2, 2020 8:49 PM

All replies

  • What is the format of the records in the text file?  CSV or Tab delimited would be a guess.  What is the max record size?

    Search Documentation

    SerialPort Info

    Multics - An OS ahead of its time.

     "Those who use Application.DoEvents have no idea what it does

        and those who know what it does never use it."    former MSDN User JohnWein

    Monday, November 2, 2020 9:03 PM
  • records are all string

    file is text file .txt

    records are NOT comma separated or delimited...they are fixed width. Like first 10 chars for name, next 3 for age and so on


    Thanks, AT

    Monday, November 2, 2020 9:10 PM
  • Hi

    My opinion is that 15 million records of some unknown size is too large to handle as one entity.

    Can you use the data being placed into several DataTables (or, to be more precise), several XMLl files that one DataTable can read by index?


    Regards Les, Livingston, Scotland

    Monday, November 2, 2020 9:18 PM
  • I am ok to read 1 M records at a time. What can be the approach to read 1 M records into DT ?

    Thanks, AT

    Monday, November 2, 2020 9:20 PM
  • records are all string

    file is text file .txt

    records are NOT comma separated or delimited...they are fixed width. Like first 10 chars for name, next 3 for age and so on


    Thanks, AT

    How long are the records?


    Search Documentation

    SerialPort Info

    Multics - An OS ahead of its time.

     "Those who use Application.DoEvents have no idea what it does

        and those who know what it does never use it."    former MSDN User JohnWein

    Monday, November 2, 2020 9:26 PM
  • I am ok to read 1 M records at a time. What can be the approach to read 1 M records into DT ?

    Thanks, AT

    Hi

    Need to know the fields and widths


    Regards Les, Livingston, Scotland

    Monday, November 2, 2020 9:32 PM
  • Hi

    Here is some mocked up code to read a sample fixed field width text data into a series of XML files.

    This example needs the Designer to have 4 labels as show in Image below.

    There is a 'timer' but for this example, it can only show 0 as there is little work done.

    Bear in mind, this is only an example - your data fields will differ and of course, lots of it. However, the idea should wotk on large data also.

    ' Sample Data used (widths: 5,7,4,3,9)
    ' here this block is repeated 4 times
    ' so there should be 4 identical files
    ' saved
    
    ' 1234512345671234123123456789
    ' ABCD ABCDE  ABCDAB ABCDEFGHI
    ' AB   AB     AB  AB AB       
    ' ABCDEABCDEFGABC A  ABCDEFGHI
    ' A    A      A   A  A        
    
    Option Strict On
    Option Explicit Off
    Public Class Form1
    	' path to main data text file
    	Dim path As String = IO.Path.Combine(My.Computer.FileSystem.SpecialDirectories.Desktop, "TestData.txt")
    
    	' number of lines to read per block
    	Dim blocksize As Integer = 5
    
    	Dim dt As New DataTable("Freddy")
    
    	' just for file naming increment
    	Dim counter As Integer = 0
    
    	' just for indicating time taken
    	Dim sw As New Stopwatch
    	Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
    		With dt
    			.Columns.Add("f1")
    			.Columns.Add("f2")
    			.Columns.Add("f3")
    			.Columns.Add("f4")
    			.Columns.Add("f5")
    		End With
    
    		GetData()
    
    	End Sub
    	Sub GetData()
    		sw.Start()
    
    		Using sr As New IO.StreamReader(path)
    			Do
    				dt.Clear()
    				For i As Integer = 1 To blocksize
    					EachLine(sr.ReadLine)
    				Next
    
    				counter += 1
    				Label2.Text = counter.ToString
    
    				Dim savepath As String = IO.Path.Combine(My.Computer.FileSystem.SpecialDirectories.Desktop, "XMLdata", "Data" & counter.ToString & ".xml")
    
    				dt.WriteXml(savepath)
    
    				Label4.Text = (sw.ElapsedMilliseconds \ 1000).ToString
    
    			Loop Until sr.EndOfStream
    		End Using
    		sw.Reset()
    	End Sub
    	Sub EachLine(s As String)
    		dt.Rows.Add(s.Substring(0, 5), s.Substring(5, 7), s.Substring(12, 4), s.Substring(16, 3), s.Substring(19, 9))
    	End Sub
    End Class


    Regards Les, Livingston, Scotland


    • Edited by leshay Saturday, November 7, 2020 12:05 AM
    Monday, November 2, 2020 10:32 PM
  • Hello,

    With this many records it's bound to take considerable time to process the data into a DataTable or any container. Best suggestion is to run the application off hours using Windows task scheduler which is what many companies include where I work handle a large file, we do this every night.

    Next, you might try the following CSV library (have not used this but many do). Even if it works I doubt it's going to be much faster than your current method as at the end of the day you are dealing with many lines, lots of disk I/O etc/

    This is a simple example for fixed width take from their page as it's shown in C#, here is the VB.NET conversion.

    This example assumes you have a file on your system called C:\data.txt which is simple text file containing data consisting of fixed column width.

    In this file there are three columns, where the first column is 5 chars wide, and the last two columns are each 10 chars wide.

    Dim reader = New FixedWidthFileReader("C:\data.csv", "5,10, 10")
    Dim dataTable As DataTable = reader.ReadAsDataTable()
    


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    My GitHub code samples
    GitHub page

    Monday, November 2, 2020 11:33 PM
  • But after you read those records, what do you plan to do with the data?

    I mean, you now have 15 million records. then what?

    In other words, I think  the problem needs to be stated in a different way.

    Say for example, we need to add up two columns? Well, then read ONE row, get the values and then read the next row, and add to these values. Thus, at that point in time, we don't have to store or build up a HUGE 15 millions row table.

    I guess the issue/question is the problem here is not really how to read 15 million rows, but what one plans to do once all that data has been read? And whatever you plan to do with that huge table, it may well be possible to do that row processing as you read the data - and thus you don't' really need a a huge in memory table.

    So, I would consider what the final goal is - you might not really need that huge table at all.

    In other words a line by line processing of the data might be possible, but it would also run much faster if you can process the data in one pass.

    Regards,

    Albert D. Kallal (Access MVP 2003-2017)
    Edmonton, Alberta Canada

    Tuesday, November 3, 2020 3:23 AM