none
Reading a .csv file and encountering Line Feeds (LR) and Carriage Returns (CR) RRS feed

  • Question

  • Im using VB to read some .csv files which I think have come out of Excel.  Some of the lines have a Line Feed (LF) and some have both Carriage REturn and Line Feeds.  I think the LF on their own is from hitting Shift-Enter in an Excel cell to get to the next line.  An example which affects the title row:

    Name,Address,Mobile<LF>

    phone number, home<LF>

    phone number, town, Country<CR>

    John Smoth, London, 0712345678,020374635,London,UK<CR>

    Bob Smith, Manchester,0712345678,0161746353,Manchester,UK<CR>

    What I want to do is find the LFs and ignore them while reading in the file, to create a 'proper' title row.  Any ideas how this can be achieved??

    Tuesday, February 21, 2017 9:26 PM

All replies

  • S,

    You might want to consider using a TextFieldParser for this.

    Private Sub GetCSVData(ByVal filePath As String) If Not String.IsNullOrWhiteSpace(filePath) Then Try Dim fi As New IO.FileInfo(filePath) If fi.Exists Then Using tfp As New Microsoft.VisualBasic.FileIO.TextFieldParser(fi.FullName) With tfp .TextFieldType = FileIO.FieldType.Delimited .Delimiters = New String() {","} ' Set the following to true if you have fields ' which are enclosed in quotation marks: .HasFieldsEnclosedInQuotes = False End With Dim currentLineOfText() As String While Not tfp.EndOfData currentLineOfText = tfp.ReadFields() Stop End While End Using End If Catch ex As Exception MessageBox.Show(String.Format("An error occurred:{0}{0}{1}", _ vbCrLf, ex.Message), _ "Error Reading Text File", _ MessageBoxButtons.OK, _ MessageBoxIcon.Warning) End Try End If End Sub


    By default it will skip empty lines.


    "One who has no vices also has no virtues..."

    Tuesday, February 21, 2017 9:41 PM
  • Then simply read the file with IO.Readfile and replace the string you get with the Replace method: all <CRLF> for <CR> and do the same for <LF>.

    Don't think about the time it will take because you're not able to think so fast as this goes for thousands of files to make those 2 replaces faster with a search and replace.

    https://msdn.microsoft.com/en-us/library/system.environment.newline(v=vs.110).aspx

    The CRLF is simply a keyword in VB "VBCRLF"

    https://msdn.microsoft.com/en-us/library/fk49wtc1(v=vs.110).aspx

    I would by the way not take the CR because the LF is more current standard. 


    Success
    Cor



    Tuesday, February 21, 2017 10:14 PM
  • What I want to do is find the LFs and ignore them while reading in the file, to create a 'proper' title row.  Any ideas how this can be achieved?

    Use ReadAllText, then replace each LF with Nothing.  Depending on how you then process the text you might need to also replace all Cr with CrLf. 

    Wednesday, February 22, 2017 12:33 AM
  • Thanks for all the ideas. I am reading the .csv file to an array which is pretty easy. However, if using file. readalltlines or readalltext then according to the information, the LF and CR are removed https://msdn.microsoft.com/en-us/library/s2tte0y1(v=vs.110).aspx Using this method, I won't know whether I have encountered a LF or CR.the same goes for textfieldparser. If I open the file in Notepadd++ I can see what is a the end of each line. Using the example at the top of this thread, I still can't read it into a single line as don't know where the LF and CR are. What I would do is keep reading the lines and concatenate into a single line until I meet a CR. any ideas?
    Thursday, February 23, 2017 8:04 AM
  • I did not write to read it in an array? Simply as text

    Dim x = System.IO.File.ReadAllText("C:\Test\tester.txt")
    x = x.Replace(vbCrLf, vbCr)
    x = x.Replace(vbLf, vbCr)

    is enough. You can not see this in Notepad because it are non printable characters which are not shown in Notepad.



    Success
    Cor

    Thursday, February 23, 2017 8:36 AM
  • Using the example at the top of this thread, I still can't read it into a single line as don't know where the LF and CR are. What I would do is keep reading the lines and concatenate into a single line until I meet a CR. any ideas?

    I don't think you're following what the TextFieldParser is doing.

    It reads each line in and, per line, it then looks for the delimiter (in this case, a comma). It then internally splits that line up into fields based on the delimiter and the result - per line - is a string array.

    I can post an example if you want?


    "One who has no vices also has no virtues..."

    Thursday, February 23, 2017 12:42 PM
  • I agree with Frank that the TextFieldParser is almost always the way to go when reading delimited text files of any kind.

    If you really want to use the Split() method though, just use .Split({ControlChars.Cr, ControlChars.Lf}, StringSplitOptions.RemoveEmptyEntries) and that will take care of making new lines out of any combination of CR and LF.

    Note that you want to use ControlChars and not the VbCr or VbCrLf because the former is of type Char while the latter are of type String.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Thursday, February 23, 2017 1:04 PM
    Moderator
  • S,

    I used a local CSV file for this and even though it's not like yours, the concept is the same:

    Option Strict On Option Explicit On Option Infer Off Imports Microsoft.VisualBasic.FileIO Public Class Form1 Private Sub Form1_Load(sender As System.Object, _ e As System.EventArgs) _ Handles MyBase.Load Dim csvFilePath As String = _ "X:\TestProgramOutput\Sample_Data_2012_10_31_16_00_32\Sample_Data_500.csv" GetCSVData(csvFilePath) End Sub Private Sub GetCSVData(ByVal filePath As String) If Not String.IsNullOrWhiteSpace(filePath) Then Try Dim fi As New IO.FileInfo(filePath) If fi.Exists Then Using tfp As New TextFieldParser(fi.FullName) With tfp .TextFieldType = FileIO.FieldType.Delimited .Delimiters = New String() {","} ' Set the following to true if you have fields ' which are enclosed in quotation marks: .HasFieldsEnclosedInQuotes = True End With Dim currentLineOfText() As String While Not tfp.EndOfData currentLineOfText = tfp.ReadFields() Stop End While End Using End If Catch ex As Exception MessageBox.Show(String.Format("An error occurred:{0}{0}{1}", _ vbCrLf, ex.Message), _ "Error Reading Text File", _ MessageBoxButtons.OK, _ MessageBoxIcon.Warning) End Try End If End Sub End Class


    It's all fake data even though it looks real. The fields in the CSV file are shrouded with double-quotes so please do notice above that I set the .HasFieldsEnclosedInQuotes property to true:

    "Carol","Q.","Zyskowski","9670 Hanson Place","Southaven","Mississippi","38672","cZyskowski@hello.net.au" "Patsy","F.","Gaerlan","1899 Utica Avenue","Pleasant Valley","Iowa","52767","pGaerlan@gaybrighton.co.uk" "Lauren","N.","Vieyra","9266 Clove Road","Ragan","Nebraska","68969","lVieyra@lovemail.com" "Catalina","E.","Landro","9820 Pitkin Avenue","Corona","South Dakota","57227","cLandro@lycosemail.com" "Juan","P.","Vanpatten","2240 Olean Street","Chauvin","Louisiana","70344","jVanpatten@chez.com" "Meghan","Y.","Mchugh","9210 Morgan Avenue","Austin","Texas","78718","mMchugh@jewishmail.com" "Cristopher","I.","Verona","9103 Bay 49th Street","Tylertown","Mississippi","39667","cVerona@samilan.net" "Adrianne","M.","Gazzo","5968 Vanderveer Place","Toms River","New Jersey","08757","aGazzo@hkg.net" "Rudolph","X.","Orrico","3812 Independence Avenue","Minneapolis","Minnesota","55424","rOrrico@virtualactive.com" "Georgia","V.","Hathcock","8215 Oriental Boulevard","Fort Mill","South Carolina","29716","gHathcock@topchat.com" "Hung","Q.","Vermeer","4156 St Nicholas Avenue","Foothill Ranch","California","92610","hVermeer@mail.salu.net"


    Anyway, you get the idea. When the program gets to "Stop" in the code above, the execution will halt just the same as though you'd put a breakpoint in. When it does, this is what you'll see (for the first line - it reads the file per line):

    It found eight (8) fields and if I expand that, it shows the data in those fields:

    Try it like that and see if you get what you want?


    "One who has no vices also has no virtues..."

    Thursday, February 23, 2017 1:07 PM

  • Note that you want to use ControlChars and not the VbCr or VbCrLf because the former is of type Char while the latter are of type String.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    ?????????????? Sorry, maybe you can elaborate this, a String is an array of char. 

    What you wrote is 100% nonsense and shows in this way a lack of knowledge. 



    Cor



    Friday, February 24, 2017 12:10 AM

  • Note that you want to use ControlChars and not the VbCr or VbCrLf because the former is of type Char while the latter are of type String.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    ?????????????? Sorry, maybe you can elaborate this, a String is an array of char. 

    What you wrote is 100% nonsense and shows in this way a lack of knowledge. 



    Cor



    Yes a string is an array of char, that has absolutely nothing to do with how the Split method functions.

    Look at the inner workings of Split, particularly the two overloads of MakeSeperatorList().  You want to pass it a Char array when you are actually splitting on a Char.  Passing a string parameter when you want to split on a char is just extra overhead for nothing.  Use the appropriate data type.  No nonsense.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Friday, February 24, 2017 1:43 AM
    Moderator