locked
Reading CSV with comma placed within double quotes? RRS feed

  • Question

  • User-437298536 posted

     Hi all,

    I was able to parse and import .CSV file into database, but is having problem parsing .csv file that have comma contained within double quotes. For example,

    SomeEmail@Email.com, FirstName, Last Name, "Some words, words after comma", More Stuffs

     
    When I parse the above line, "Some words got inserted into a column, and then words after comma" got insert to another column due to the comma in it. What do I need to do to parse that as a single field for everything that contained within quotes? I'm working on a C# website by the way.

    Thank you very much.
     

    Monday, April 14, 2008 5:25 PM

Answers

  • User853727733 posted
    I liked Steves code even better than mine. So I'll give you an example using his code.

      

    1            Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");        
    2          
    3            conn.Open();
    4            while (!srMyStream.EndOfStream)
    5            {
    6                s = srMyStream.ReadLine();
    7            String[] Fields = CSVParser.Split(s);
    8    
    9            // clean up the fields (remove " and leading spaces)
    10           for (int i = 0; i < Fields.Length; i++)
    11           {
    12               Fields[i] = Fields[i].TrimStart(' ', '"');
    13               Fields[i] = Fields[i].TrimEnd('"');
    14           }  
    15               string FaxNum = Fields[0];
    16               string ProjCode = Fields[1];
    17               string Email = Fields[2];
    18               string PhoneNum = Fields[3];
    19   
    20               cmd = new SqlCommand("spInsertCSV", conn);
    21               cmd.CommandType = CommandType.StoredProcedure;
    22               cmd.Parameters.Add("@FileID", SqlDbType.Int).Value = _fileID;
    23               cmd.Parameters.Add("@FaxNumber", SqlDbType.VarChar).Value = FaxNum;
    24               cmd.Parameters.Add("@ProjCode", SqlDbType.VarChar).Value = ProjCode;
    25               cmd.Parameters.Add("@Email", SqlDbType.VarChar).Value = Email;
    26               cmd.Parameters.Add("@PhoneNumber", SqlDbType.VarChar).Value = PhoneNum;
    27   
    28               cmd.ExecuteNonQuery();
    29           }
    30           srMyStream.Close();
    

     

    You could improve this later on creating a method to encapsulate the processing of the string (each line of your file).

     I hope this helps.
     

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Tuesday, April 15, 2008 10:46 AM
  • User853727733 posted

    Ok, you getting the text from a web page so it's html enconded. You need to decode. I changed your method to do this decoding.

     

     

    public void ExportToCSV(GridView myGV, string fileName)
        {
            Response.Clear();
            Response.AddHeader("content-disposition", string.Format("attachment; filename={0}.csv", fileName));
            Response.Charset = "";
            Response.ContentType = "application/vnd.csv";
    
            StringBuilder sBuilder = new StringBuilder();
            StringWriter sWriter = new StringWriter(sBuilder);
    
            string str;
    
            // This is the Header section, take the for loop out if no header wanted
            for (int k = 0; k <= (myGV.Columns.Count - 1); k++)
            {
                sWriter.Write(myGV.HeaderRow.Cells[k].Text + ",");
            }
            sWriter.WriteLine(",");
    
            // Loop through the Content
            for (int i = 0; i <= (myGV.Rows.Count - 1); i++)
            {
                for (int j = 0; j <= (myGV.Columns.Count - 1); j++)
                {
                    str = (myGV.Rows[i].Cells[j].Text.ToString());
                    str = HttpUtility.HtmlDecode(str);
                    //if (str == "&nbsp;")
                    //    str = ""; //not needed anymore, decode is already doing this
                    str = (str + ",");
    
                    sWriter.Write(str);
                }
                sWriter.WriteLine();
            }
            sWriter.Close();
            string testResult = sBuilder.ToString();
            Response.Write(sBuilder.ToString());
            Response.End();
        }
      This should do the trick.
    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Tuesday, April 15, 2008 11:47 AM

All replies

  • User187056398 posted

    Try this:

     

       protected void Button1_Click(object sender, EventArgs e)
        {
            String Test = "SomeEmail@Email.com, FirstName, Last Name, \"Some words, words after comma\", More Stuffs";
    
            // extract the fields
            Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");        
            String[] Fields = CSVParser.Split(Test);
    
            // clean up the fields (remove " and leading spaces)
            for (int i = 0; i < Fields.Length; i++)
            {
                Fields[i] = Fields[i].TrimStart(' ', '"');
                Fields[i] = Fields[i].TrimEnd('"');
            }
        }
     
    Monday, April 14, 2008 9:21 PM
  • User853727733 posted

    Hi, this code might help you.

     

    1    string s = "SomeEmail@Email.com, FirstName, Last Name, \"Some words, words after comma\", More Stuffs";
    2    Regex r = new Regex("(([^,^\"])*(\".*\")*([^,^\"])*)(,|$)");
    3    for (Match m = r.Match(s); m.Success; m = m.NextMatch())
    4    {
    5         Console.WriteLine(m.Value);
    6    }
    
    It uses regular expressions and  iterate's over all the results. Here I just print the results to the console but you can do whatever you want.
    Monday, April 14, 2008 10:28 PM
  • User-437298536 posted

    Hi Steve and Gabriel,

    How do I apply that to this code? I'm relative knew to this and I tweaked this code from an article on the internet so I'm not really sure:

     

    conn.Open();
            while (!srMyStream.EndOfStream)
            {
                s = srMyStream.ReadLine();
    
                string FaxNum = s.Split(',')[0].Trim();
                string ProjCode = s.Split(',')[1].Trim();
                string Email = s.Split(',')[2].Trim();
                string PhoneNum = s.Split(',')[3].Trim();
    
                cmd = new SqlCommand("spInsertCSV", conn);
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add("@FileID", SqlDbType.Int).Value = _fileID;
                cmd.Parameters.Add("@FaxNumber", SqlDbType.VarChar).Value = FaxNum;
                cmd.Parameters.Add("@ProjCode", SqlDbType.VarChar).Value = ProjCode;
                cmd.Parameters.Add("@Email", SqlDbType.VarChar).Value = Email;
                cmd.Parameters.Add("@PhoneNumber", SqlDbType.VarChar).Value = PhoneNum;
    
                cmd.ExecuteNonQuery();
            }
            srMyStream.Close();

     

    I read this from an uploaded CSV file, and then parse it and do the insertion to the database. Lets said  ProjCode is where I have the quotes and the comma within it.

    Thanks a lot,

    Kenny.

    Tuesday, April 15, 2008 10:32 AM
  • User853727733 posted
    I liked Steves code even better than mine. So I'll give you an example using his code.

      

    1            Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");        
    2          
    3            conn.Open();
    4            while (!srMyStream.EndOfStream)
    5            {
    6                s = srMyStream.ReadLine();
    7            String[] Fields = CSVParser.Split(s);
    8    
    9            // clean up the fields (remove " and leading spaces)
    10           for (int i = 0; i < Fields.Length; i++)
    11           {
    12               Fields[i] = Fields[i].TrimStart(' ', '"');
    13               Fields[i] = Fields[i].TrimEnd('"');
    14           }  
    15               string FaxNum = Fields[0];
    16               string ProjCode = Fields[1];
    17               string Email = Fields[2];
    18               string PhoneNum = Fields[3];
    19   
    20               cmd = new SqlCommand("spInsertCSV", conn);
    21               cmd.CommandType = CommandType.StoredProcedure;
    22               cmd.Parameters.Add("@FileID", SqlDbType.Int).Value = _fileID;
    23               cmd.Parameters.Add("@FaxNumber", SqlDbType.VarChar).Value = FaxNum;
    24               cmd.Parameters.Add("@ProjCode", SqlDbType.VarChar).Value = ProjCode;
    25               cmd.Parameters.Add("@Email", SqlDbType.VarChar).Value = Email;
    26               cmd.Parameters.Add("@PhoneNumber", SqlDbType.VarChar).Value = PhoneNum;
    27   
    28               cmd.ExecuteNonQuery();
    29           }
    30           srMyStream.Close();
    

     

    You could improve this later on creating a method to encapsulate the processing of the string (each line of your file).

     I hope this helps.
     

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Tuesday, April 15, 2008 10:46 AM
  • User-437298536 posted

    Hi Gabriel,

    I think I'm almost get there, however, when I insert that to the database, I did some checking on the background, then bind back the processed information to a GridView, and output back to a different CSV file. My questions:

    1. Is there a way to keep the quotes (") in the original CSV file and inserted that whole thing to the database?

    2. Do the same thing when I output to the CSV file. Right now with the code you helped on the above message, I was able to save to the database the texts with comma in it, but the quotes got stripped out, and when outputting to another CSV file, the texts got split again into two columns. Here is my code for exporting to CSV:

      

    public void ExportToCSV(GridView myGV, string fileName)
        {
            Response.Clear();
            Response.AddHeader("content-disposition", string.Format("attachment; filename={0}.csv", fileName));
            Response.Charset = "";
            Response.ContentType = "application/vnd.csv";
    
            StringBuilder sBuilder = new StringBuilder();
            StringWriter sWriter = new StringWriter(sBuilder);
    
            string str;
    
            // This is the Header section, take the for loop out if no header wanted
            for (int k = 0; k <= (myGV.Columns.Count - 1); k++)
            {
                sWriter.Write(myGV.HeaderRow.Cells[k].Text + ",");
            }
            sWriter.WriteLine(",");
    
            // Loop through the Content
            for (int i = 0; i <= (myGV.Rows.Count - 1); i++)
            {
                for (int j = 0; j <= (myGV.Columns.Count - 1); j++)
                {
                    str = (myGV.Rows[i].Cells[j].Text.ToString());
                    if (str == "&nbsp;")
                        str = "";
                    str = (str + ",");
    
                    sWriter.Write(str);
                }
                sWriter.WriteLine();
            }
            sWriter.Close();
            string testResult = sBuilder.ToString();
            Response.Write(sBuilder.ToString());
            Response.End();
        }

     

    Thank you so, so much for your help.

    Kenny. 

    Tuesday, April 15, 2008 11:05 AM
  • User853727733 posted

    You just need to remove the line of code that removes the quotes from the results. I'm posting the code again with this line commented out.

     

     
                Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");        

    conn.Open();
    while (!srMyStream.EndOfStream)
    {
    s = srMyStream.ReadLine();
    String[] Fields = CSVParser.Split(s);

    // clean up the fields (remove " and leading spaces)
    for (int i = 0; i < Fields.Length; i++)
    {
    //Fields[i] = Fields[i].TrimStart(' ', '"');
    //Fields[i] = Fields[i].TrimEnd('"'); //this line remove the quotes
                     Fields[i] = Fields[i].Trim();

    }
    string FaxNum = Fields[0];
    string ProjCode = Fields[1];
    string Email = Fields[2];
    string PhoneNum = Fields[3];

    cmd = new SqlCommand("spInsertCSV", conn);
    cmd.CommandType = CommandType.StoredProcedure;
    cmd.Parameters.Add("@FileID", SqlDbType.Int).Value = _fileID;
    cmd.Parameters.Add("@FaxNumber", SqlDbType.VarChar).Value = FaxNum;
    cmd.Parameters.Add("@ProjCode", SqlDbType.VarChar).Value = ProjCode;
    cmd.Parameters.Add("@Email", SqlDbType.VarChar).Value = Email;
    cmd.Parameters.Add("@PhoneNumber", SqlDbType.VarChar).Value = PhoneNum;

    cmd.ExecuteNonQuery();
    }
    srMyStream.Close();
     Doing this should keep the quotes and when exporting back to csv it will keep the quotes also and therefore should not create a new column in the file.
     
    If this post solves your problem please mark it as a answer. 
      
    Tuesday, April 15, 2008 11:19 AM
  • User-437298536 posted

    Thanks again Gabriel,

    It's saving to the database correctly now, but I'm having problem with exporting it. The quotes " got replaced with &quot; and it got split into two columns.

    Kenny. 

    Tuesday, April 15, 2008 11:36 AM
  • User853727733 posted

    Ok, you getting the text from a web page so it's html enconded. You need to decode. I changed your method to do this decoding.

     

     

    public void ExportToCSV(GridView myGV, string fileName)
        {
            Response.Clear();
            Response.AddHeader("content-disposition", string.Format("attachment; filename={0}.csv", fileName));
            Response.Charset = "";
            Response.ContentType = "application/vnd.csv";
    
            StringBuilder sBuilder = new StringBuilder();
            StringWriter sWriter = new StringWriter(sBuilder);
    
            string str;
    
            // This is the Header section, take the for loop out if no header wanted
            for (int k = 0; k <= (myGV.Columns.Count - 1); k++)
            {
                sWriter.Write(myGV.HeaderRow.Cells[k].Text + ",");
            }
            sWriter.WriteLine(",");
    
            // Loop through the Content
            for (int i = 0; i <= (myGV.Rows.Count - 1); i++)
            {
                for (int j = 0; j <= (myGV.Columns.Count - 1); j++)
                {
                    str = (myGV.Rows[i].Cells[j].Text.ToString());
                    str = HttpUtility.HtmlDecode(str);
                    //if (str == "&nbsp;")
                    //    str = ""; //not needed anymore, decode is already doing this
                    str = (str + ",");
    
                    sWriter.Write(str);
                }
                sWriter.WriteLine();
            }
            sWriter.Close();
            string testResult = sBuilder.ToString();
            Response.Write(sBuilder.ToString());
            Response.End();
        }
      This should do the trick.
    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Tuesday, April 15, 2008 11:47 AM
  • User-437298536 posted

    Hi Gabriel,

    You saved my day. Can't thank you enough for your help this morning (or whatever that is in your time) :)

    Kenny.

    Tuesday, April 15, 2008 11:55 AM
  • User853727733 posted

    No problem Kenny, I'm just glad I could help.

    Tuesday, April 15, 2008 12:12 PM