none
parse csv file RRS feed

  • Question

  • I'm running into some issues parsing a CSV that has commas in the field as well as the fields are separated by commas.

    example, I line within the CSV file looks like this

    column1, column2, column3,column4
    1,00099,"Ford f250, Big Ranch",used, Smith
    1,99878,"GMC, Denalli", New, Rogers
    1,4456779,"Chevy", Used, Rogers
    1,99888997,"Chevy, High Country", New, Smith

    the issues I'm running into is that the string is splitting at the comma in column 2. What can I do to remove the quotes,  split each column at the comma, however, not split at the comma if it is in the column itself?

    Wednesday, January 9, 2019 3:03 PM

All replies

    • Proposed as answer by phil chelis Wednesday, January 9, 2019 4:17 PM
    Wednesday, January 9, 2019 3:58 PM
  • I tried that the other day and still separated at the comma after Ford f250, GMC, etc.
    Wednesday, January 9, 2019 4:08 PM
  • Hello,

    In the following example done in VB.NET I delimit the read fields with dashes so you can see the code presented preserves commas in the data. I don't check for the first line being column headers.

    column1-column2-column3-column4
    1-00099-Ford f250, Big Ranch-used-Smith
    1-99878-GMC, Denalli-New-Rogers
    1-4456779-Chevy-Used-Rogers
    1-99888997-Chevy, High Country-New-Smith

    Code

    Imports System.IO
    Public Class Form1
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
            Using reader As New FileIO.
                TextFieldParser(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "TextFile1.txt"))
    
                reader.TextFieldType = FileIO.FieldType.Delimited
                reader.Delimiters = New String() {","}
                reader.HasFieldsEnclosedInQuotes = True
    
                Dim fields As String()
                While Not reader.EndOfData
                    fields = reader.ReadFields()
                    Dim newFields = fields.Select(Function(f) If(f.Contains(","), $"""{f}""", f))
                    Console.WriteLine(String.Join("-", fields))
                End While
            End Using
        End Sub
    End Class
    

    Here I remove the first row which in your sample is the row with field names.

    Imports System.IO
    Public Class Form1
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
            Dim lineList As New List(Of String)
    
            Using reader As New FileIO.
                TextFieldParser(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "TextFile1.txt"))
    
                reader.TextFieldType = FileIO.FieldType.Delimited
                reader.Delimiters = New String() {","}
                reader.HasFieldsEnclosedInQuotes = True
                Dim fields As String()
                While Not reader.EndOfData
                    fields = reader.ReadFields()
                    Dim newFields = fields.Select(Function(f) If(f.Contains(","), $"""{f}""", f))
                    lineList.Add(String.Join("-", fields))
                End While
            End Using
    
            lineList.RemoveAt(0)
        End Sub
    End Class
    

    For C# add a reference to your project for Microsoft.VisualBasic.FileIO.  Then code it e.g.

    var lineList = new List<string>();
    using (var reader = new TextFieldParser(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "TextFile1.txt")))
    {
    
        reader.TextFieldType = FieldType.Delimited;
        reader.Delimiters = new string[] { "," };
        reader.HasFieldsEnclosedInQuotes = true;
        string[] fields = null;
        while (!reader.EndOfData)
        {
            fields = reader.ReadFields();
            var newFields = fields.Select((f) => (f.Contains(",") ? $"\"{f}\"" : f));
            lineList.Add(string.Join("-", fields));
        }
    }
    
    lineList.RemoveAt(0);


    Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.
    VB Forums - moderator
    profile for Karen Payne on Stack Exchange, a network of free, community-driven Q&A sites

    Wednesday, January 9, 2019 4:11 PM
    Moderator
  • I tried that the other day and still separated at the comma after Ford f250, GMC, etc.

    See what I did with newFields in my first reply.

    var newFields = fields.Select((f) => (f.Contains(",") ? $"\"{f}\"" : f));


    Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.
    VB Forums - moderator
    profile for Karen Payne on Stack Exchange, a network of free, community-driven Q&A sites

    Wednesday, January 9, 2019 4:12 PM
    Moderator
  • I tried that the other day and still separated at the comma after Ford f250, GMC, etc.

    I just tested with your file and it works 

    For example, 2 first lines :

    -  parts {string[4]} string[]
      [0] "column1" string
      [1] "column2" string
      [2] "column3" string
      [3] "column4" string

    -  parts {string[5]} string[]
      [0] "1" string
      [1] "00099" string
      [2] "Ford f250, Big Ranch" string
      [3] "used" string
      [4] "Smith" string

    Wednesday, January 9, 2019 4:14 PM
  • Hello, 

    Most likely you didn't properly configured the reader. Check below Karen's code  - 

    reader.HasFieldsEnclosedInQuotes = True

    Sincerely, Highly skilled coding monkey.

    Wednesday, January 9, 2019 4:36 PM
  • I got it parsing, however, when I try to add it to a datable now to insert into my DB, I keep seeing the first record for each data row, however, if I write it out to the console, I see all rows. how can I get the values added to my datatable now?

    using (var reader = new TextFieldParser(fileName))
                {
                    reader.TextFieldType = FieldType.Delimited;
                    reader.Delimiters = new string[] { "," };
                    reader.HasFieldsEnclosedInQuotes = true;
                    string[] fields = null;
                    while (!reader.EndOfData)
                    {
                        fields = reader.ReadFields();
                        var newFields = fields.Select((f) => (f.Contains(",") ? $"\"{f}\"" : f));
                        lineList.Add(string.Join("-", fields));
                        Console.WriteLine(fields[1].ToString());
    
                        dt.Rows.Add();
                        foreach(DataRow dr in dt.Rows)
                        {
                            dr["SellersId"] = fields[0].ToString();
                            dr["Make"] = fields[1].ToString();
                        }
    
                    }
    
                }

    Wednesday, January 9, 2019 4:55 PM
  • Do this instead of the foreach:

    DataRow dr = dt.NewRow();
    dr["SellersId"] = fields[0].ToString();
    dr["Make"] = fields[1].ToString();
    dt.Rows.Add(dr);
    


    ~~Bonnie DeWitt [C# MVP]

    http://geek-goddess-bonnie.blogspot.com

    Wednesday, January 9, 2019 5:25 PM
    Moderator
  • Try the following, excludes header.

    var dt = new DataTable();
    dt.Columns.Add(new DataColumn() {ColumnName = "SellersId", DataType = typeof(string)});
    dt.Columns.Add(new DataColumn() { ColumnName = "Make", DataType = typeof(string) });
    
    using (var reader = new TextFieldParser(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "TextFile1.txt")))
    {
    
        reader.TextFieldType = FieldType.Delimited;
        reader.Delimiters = new string[] { "," };
        reader.HasFieldsEnclosedInQuotes = true;
        string[] fields = null;
        bool firstTime = true;
        while (!reader.EndOfData)
        {
            fields = reader.ReadFields();
            if (firstTime)
            {
                firstTime = false;
            }
            else
            {
                var newFields = fields.Select((f) => (f.Contains(",") ? $"\"{f}\"" : f)).ToArray();
                dt.Rows.Add(newFields[0], newFields[1]);
    
            }
        }
    }
    
    

    I changes the values for column 1 so it's easy to see when visualized as per above.

    column1, column2, column3,column4
    1,00099,"Ford f250, Big Ranch",used, Smith
    2,99878,"GMC, Denalli", New, Rogers
    3,4456779,"Chevy", Used, Rogers
    4,99888997,"Chevy, High Country", New, Smith


    Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.
    VB Forums - moderator
    profile for Karen Payne on Stack Exchange, a network of free, community-driven Q&A sites

    Wednesday, January 9, 2019 6:31 PM
    Moderator
  •         public class CSVParser
            {
                public IEnumerable<IEnumerable<string>> Parse(string input)
                {
                    var rows = input.Split(new[] { '\r', '\n' }).Where(x => !string.IsNullOrEmpty(x));
                    var collection = new List<IEnumerable<string>>();
                    foreach (var row in rows)
                    {
                        collection.Add(parseLine(row));
                    }
                    return collection;
                }
    
                private IEnumerable<string> parseLine(string input)
                {
                    var collection = new List<string>();
                    var word = "";
                    var isString = false;
                    var isEscaped = false;
                    foreach (var c in input)
                    {
                        if (isString)
                        {
                            if (isEscaped && c == '\\')
                            {
                                word += $"\\{c}";
                                isEscaped = false;
                            }
                            if (c == '\\')
                            {
                                isEscaped = !isEscaped;
                            }
                            else if (isEscaped && c == '"')
                            {
                                word += c;
                                isEscaped = false;
                            }
                            else if (!isEscaped && c == '"')
                            {
                                isString = false;
                            }
                            else
                            {
                                word += c;
                            }
                        }
                        else
                        {
                            if(c == ',')
                            {
                                collection.Add(word);
                                word = "";
                            }
                            else if(c == '"')
                            {
                                isString = true;
                            }
                            else
                            {
                                word += c;
                            }
                        }
                    }
                    if(!string.IsNullOrEmpty(word))
                        collection.Add(word);
                    return collection;
                }
            }
            static void Main(string[] args)
            {
                string input = @"column1, column2, column3,column4
    1,00099,""Ford f250, Big Ranch\""abc"",used, Smith
    1,99878,""GMC, Denalli"", New, Rogers
    1,4456779,""Chevy"", Used, Rogers
    1,99888997,""Chevy, High Country"", New, Smith";
                var parser = new CSVParser();
                var test = parser.Parse(input);
            }

    Wednesday, January 9, 2019 7:17 PM
  • You might be interested in the below solution with free spire.xls for .net dll.

    using System.Data;
    using Spire.Xls;
    
    namespace CSVToDatatable
    {
        class Program
        {
            static void Main(string[] args)
            {           
                Workbook workbook = new Workbook();
                workbook.LoadFromFile(@"..\input.csv", ",");
                Worksheet worksheet = workbook.Worksheets[0];
                DataTable dt = worksheet.ExportDataTable();            
            }
        }
    }
    

    Thursday, January 10, 2019 4:04 AM
  • Types of the column in your DataTable is not suitable to contain String value.

    Specify required type and size in your column definition.


    Sincerely, Highly skilled coding monkey.

    Thursday, January 10, 2019 11:01 AM
  • Hi SharePointGuy123,

    Is there any update? do you try the method that Kareninstructor provided, if the issue still exists, please feel free let us know.
    Best regards,

    Jack

    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, January 16, 2019 3:22 AM
    Moderator