locked
parse and upload CSV data RRS feed

  • Question

  • User-2132746437 posted

    We upload CSV files to our site and it gets parsed and uploaded to the DB, which worked fine, however, the CSV file changed, meaning that there are line breaks now with some lines or lines are continuing to the next line so the parsing process is skipping those lines or errors out.

    The CSV file now looks like this now:

    "@","header1","header2","header3","header4","%"
    "@","name:testing","description: this is for the new testing 
    unit which starts", "%"
    "@","name:developement","description: new developement phase","%"
    "@","name:sales","description: this will be a new tool for sales
    that will be used","category:sales", "state"" needed for sales in the following states:
    1) delaware
    2) Florida
    3) Texas", "%"

    how can I parse this now with the line breaks in it like it has? Before it was just on one line for everything and each line began with "@" and ended with "%", now I may have separate lines for content. 

    Wednesday, January 10, 2018 5:15 PM

Answers

  • User-718146471 posted

    A quick search around revealed this as a way of handling line breaks in your CSV https://stackoverflow.com/a/1179334/621509

    Match match = Regex.Match(line, @"^(?:,?(?<q>['"](?<field>.*?\k'q')|(?<field>[^,]*))+$");
    if (match.Success)
    {
      foreach (var capture in match.Groups["field"].Captures)
      {
        string fieldValue = capture.Value;
        // Use the value.
      }
    }

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, January 10, 2018 5:29 PM
  • User-2132746437 posted
    try
    {
         
          string input = strSearch.Replace("\"", "'");
          string output = strOuput.Replace("','", "|");
          string[] finalData = output.Split('|');
    
          items["name"] = finalData.GetValue(1).toString().Remove(0,10); //remove the 'name' text
    
       items.Update();
         
    }       
         

    here is some of the code parsing it out

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, January 10, 2018 5:35 PM
  • User303363814 posted

    Yep, csv looks simple when you start but there are dark corners which catch you out later.

    Don't reinvent the wheel, use a prebuilt solution.  There is a nice NuGet package at https://www.nuget.org/packages/LinqToCsv which means that you can stop having to deal with the details of csv syntax and get back to business logic.

    This library makes it easy to use CSV files with LINQ queries. Its features include:

    • Follows the most common rules for CSV files. Correctly handles data fields that contain commas and line breaks.
    • In addition to comma, most delimiting characters can be used, including tab for tab delimited fields.
    • Can be used with an IEnumarable of an anonymous class - which is often returned by a LINQ query.
    • Supports deferred reading.
    • Supports processing files with international date and number formats.
    • Supports different character encodings if you need them.
    • Recognizes a wide variety of date and number formats when reading files.
    • Provides fine control of date and number formats when writing files.
    • Robust error handling, allowing you to quickly find and fix problems in large input files.
    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, January 10, 2018 9:38 PM

All replies

  • User-718146471 posted

    Can you post your original code that handled this so I can tweak it for you? I have had to handle this kind of line break nonsense before.

    Wednesday, January 10, 2018 5:27 PM
  • User-718146471 posted

    A quick search around revealed this as a way of handling line breaks in your CSV https://stackoverflow.com/a/1179334/621509

    Match match = Regex.Match(line, @"^(?:,?(?<q>['"](?<field>.*?\k'q')|(?<field>[^,]*))+$");
    if (match.Success)
    {
      foreach (var capture in match.Groups["field"].Captures)
      {
        string fieldValue = capture.Value;
        // Use the value.
      }
    }

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, January 10, 2018 5:29 PM
  • User-2132746437 posted
    try
    {
         
          string input = strSearch.Replace("\"", "'");
          string output = strOuput.Replace("','", "|");
          string[] finalData = output.Split('|');
    
          items["name"] = finalData.GetValue(1).toString().Remove(0,10); //remove the 'name' text
    
       items.Update();
         
    }       
         

    here is some of the code parsing it out

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, January 10, 2018 5:35 PM
  • User303363814 posted

    Yep, csv looks simple when you start but there are dark corners which catch you out later.

    Don't reinvent the wheel, use a prebuilt solution.  There is a nice NuGet package at https://www.nuget.org/packages/LinqToCsv which means that you can stop having to deal with the details of csv syntax and get back to business logic.

    This library makes it easy to use CSV files with LINQ queries. Its features include:

    • Follows the most common rules for CSV files. Correctly handles data fields that contain commas and line breaks.
    • In addition to comma, most delimiting characters can be used, including tab for tab delimited fields.
    • Can be used with an IEnumarable of an anonymous class - which is often returned by a LINQ query.
    • Supports deferred reading.
    • Supports processing files with international date and number formats.
    • Supports different character encodings if you need them.
    • Recognizes a wide variety of date and number formats when reading files.
    • Provides fine control of date and number formats when writing files.
    • Robust error handling, allowing you to quickly find and fix problems in large input files.
    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, January 10, 2018 9:38 PM