CSV TextfieldParser parsing through lines in CSV File with Single and Double Quotes, while skipping lines of unwanted data

Answered CSV TextfieldParser parsing through lines in CSV File with Single and Double Quotes, while skipping lines of unwanted data

  • Sunday, February 19, 2012 3:53 PM
     
      Has Code

    This question addresses another issue that applies to the first issue that I had, which was answered in the link below

    http://social.msdn.microsoft.com/Forums/en/csharpgeneral/thread/01d6ee8f-0c73-4d78-95fc-03afa75eaf59

    i have another question, if i want to read the file in line by line and parse the contents of each line with the textfieldparser is it possible and if so how would i do it? below is a snippet of code that i was running before i saw that i was going to run into the delimeter problem. I ask this because i run into the problem where i want to continue on certain lines as shown below that contains certain content that i know is not data i care about, just wondering if i can just add the textfieldparser logic to what i already have below. Thanks again for your help, it is more than appreciated. Take a look at the code below starting at (string[] lines) and you'll see what i mean. Thanks again for the help guys...

    string[] lines = File.ReadAllLines(this.Variables.CSVFile); 
    
    string company = string.Empty; 
    string group = string.Empty; 
    
    foreach (string line in lines) 
    { 
    
    //string[] fields = line.Split(new string[] { parserDouble }, StringSplitOptions.None); 
    //i know the above commented line would not be used with the textfieldparser logic, 
    //just want to show what i had before 
    
    if (fields.Length == 0 || line.Trim().Length == 0) 
    { 
    continue; 
    } 
    
    string start = fields[0] as string; 
    
    if (start.Contains("Company Name: ")) 
    { 
    company = start.Replace("Company Name: ", "") 
    .Replace("\"", ""); 
    continue; 
    } 
    
    if (start.Contains("Group: ")) 
    { 
    group = start.Replace("Group: ", ""); 
    continue; 
    } 
    
    if (start.Contains("Reference")) 
    { 
    continue; 
    } 
    
    if (string.IsNullOrEmpty(company) ||
    line.Contains("Total") || 
    start.Contains("Note: The above")) 
    { 
    continue; 
    } 
    
    if (line.Contains("Total:")) 
    { 
    break; //end of file 
    } 
    }
    
    






All Replies

  • Sunday, February 19, 2012 4:28 PM
     
     

    Hi youranonymous2727;

    The TextFieldParser has a property called CommentTokens that takes a string array of any line you can identify as Comments or lines that do not contain data. If you can post those lines here from the beginning of each line then maybe we can set its array up so that the parser will ignore them. See documentation link I posted in last question for Property CommentTokens.

     


    Fernando (MCSD)

    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

  • Sunday, February 19, 2012 4:50 PM
     
     

    below i included some sample text, as per my code earlier i'm trying to remove everything before reference, as well as reference itself because i'm using an ssis script component to read through the csv files and port into a database because i know where to expect the data for the needed fields, for example i know that reference is always in fields[0], which is column a in excel so i always grab data associated with that column, but i also know that some data that is in fields[0] i need to ignore like if fields[0] says (Note: This is for company use) is there i skip it

    //--file starts here--//

    COB date: 11-Dec-2011


    "All"
    "ABC Master List"

    Company


    Statement ID: 9999999
    Client ID: 0000


    "Please refer to Gump disclaimer"
    THE FOLLOWING Invoice is for 11-Dec-2011


    Company Name: "Gump"
    Group: Yum

    Reference,Product,Date,Leave Date,CCY,Amount,CCY2,Sell Amount,Rate,Ccy3,Evaluation1,Evaluation2,Rate,EvaluationDate,

    "woeiurwe","werwe","13-Oct-2011","17-Oct-2016","ddd","9,228.00","eee","8,000.00","23423","fff","(33.83)","(52.73)","23.45","30-Dec-2011"
    "32ew","ewrdfs","13-Oct-2011","17-Apr-2012","ddd","2,000.00","www","2,459.00","456456","wef","(19.20)","(30.56)","90.32","30-Dec-2011"

    werwe, alkdjfs,12312,09312,"wer,2312",234,"923482.00",w423,2342-0,843,"kjsdlkjf",2349202,"234293",2312-0,"23423",0203

    Note: This is for company use

     
  • Sunday, February 19, 2012 5:44 PM
     
     Answered Has Code

    Hi youranonymous2727;

    The way that the CommentTokens property works is as follows. It starts at the first non space character from the beginning of the line and stops at the character just before the first space. So when you build the string array to identify lines that do not have data start with the first non space character of the line and use as many characters as needed that will never show up as the start of a line that does contain data. The best way to do this is if you have control of the creation of the input file to start every non data line with a character that will never be found in a data line or string of characters such as, NON DATA. But if that is not possible then following the beginning of this post will have to do.

    I placed the data in your last post starting after the line //--file starts here--// until just before the line, Note: This is for company use. I then added the array ignoreLines and assigned that to the CommentTokens property, see code snippet for changes.

    using(TextFieldParser fieldParser = new TextFieldParser("C:/Working Directory/TestFile2.csv"))
    {
        string[] ignoreLines = {"COB", "\"All\"", "\"ABC", "Company",
                                "Statement", "Client", "\"Please",
                                "THE", "Group:", "Reference,Product,Date,Leave"};
    
        fieldParser.TextFieldType = FieldType.Delimited;
        fieldParser.Delimiters = new string[] {","};
        fieldParser.HasFieldsEnclosedInQuotes = true;
        fieldParser.CommentTokens = ignoreLines;
    
        string[] fields;
    
        while( !fieldParser.EndOfData )
        {
            fields = fieldParser.ReadFields( );
            foreach (var field in fields)
            {
                Console.Write(field + "  |  ");
            }
            Console.WriteLine();
        }
    }

    The following was the results of running the above code

    woeiurwe  |  werwe  |  13-Oct-2011  |  17-Oct-2016  |  ddd  |  9,228.00  |  eee  |  8,000.00  |  23423  |  fff  |  (33.83)  |  (52.73)  |  23.45  |  30-Dec-2011  |  
    32ew  |  ewrdfs  |  13-Oct-2011  |  17-Apr-2012  |  ddd  |  2,000.00  |  www  |  2,459.00  |  456456  |  wef  |  (19.20)  |  (30.56)  |  90.32  |  30-Dec-2011  |  
    werwe  |  alkdjfs  |  12312  |  09312  |  wer,2312  |  234  |  923482.00  |  w423  |  2342-0  |  843  |  kjsdlkjf  |  2349202  |  234293  |  2312-0  |  23423  |  0203  |  


    Fernando (MCSD)

    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

  • Sunday, February 19, 2012 6:37 PM
     
     
    Once again thanks fernando, i'll try it out and let you know if it works (which it probably will) and if i have any problems. You are too good man, let me give it a shot and i'll get back to this in a bit and see what i get...
  • Sunday, February 19, 2012 6:39 PM
     
     
     

    Not a problem.

     


    Fernando (MCSD)

    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

  • Sunday, February 19, 2012 7:59 PM
     
     
    I have another question...yeah i know another one, if i'm assigning values based on the positioning of data while i'm reading in, for example if i'm looking for the data in the 5th position, before i would use fields[4], how would i do that now with the code that you put up? once i know that i'll be able to move forward because the file that's coming in has to be parsed based on where i expect the data to be in, ex: product = fields[4]...just wondering if that would be possible with your code to do that, once again thanks a lot for your help fernando :)
  • Sunday, February 19, 2012 8:24 PM
     
     

    Hi youranonymous2727;

    Not a problem. Because the parsing is placed into a string array field by field from left to right and because arrays are zero base when you state this, "for example if i'm looking for the data in the 5th position", if your position starts with 1 then you index the fields array with index 4 or fields[4] and position 4 would be fields[3] but if your positions start at zero then your position 5 is fields[5].

    If I am not clear please let me know.

     


    Fernando (MCSD)

    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

  • Sunday, February 19, 2012 9:14 PM
     
     
    i understand that the array as well as c# usually is zero based unless i'm wrong, so if i'm looking for the 5th position in the code that you posted for my issue it would be variable = fields[4] is that right? are there any situations in c# where the first position is not 0? other than maybe the developer specifically denoting that. some clarification on these topics would be most appreciated :)
  • Sunday, February 19, 2012 9:48 PM
     
      Has Code

    Hi youranonymous2727;

    To your statement, "i understand that the array as well as c# usually is zero based unless i'm wrong,", arrays are zero indexed: an array with n elements is indexed from 0 to n-1 always this holds true for collections as well.

    To your statement, "if i'm looking for the 5th position in the code that you posted for my issue it would be variable = fields[4] is that right?", using the next line as the data coming from the file then the array fields in the program would be as shown in the image below the input line. Note that fields[0] equals "woeiurwe", fields[1] equals "werwe", ... , fields[13] equals "30-Dec-2011"

    "woeiurwe","werwe","13-Oct-2011","17-Oct-2016","ddd","9,228.00","eee","8,000.00","23423","fff","(33.83)","(52.73)","23.45","30-Dec-2011"

    fields array values

    To your question, "are there any situations in c# where the first position is not 0?", No, all collections and arrays are always zero base.


    Fernando (MCSD)

    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

  • Sunday, February 19, 2012 9:59 PM
     
     
    I must say fernando, you are clutch, your explanations have saved me a ton of time of trying to figure things out, you have made things quite easy for me to understand. Would it be too much to just ask you questions? :) I will work on implementing the logic and code that you have given me and give you my results shortly
  • Sunday, February 19, 2012 10:15 PM
     
     

    Hi youranonymous2727;

    Please open all new questions in a new thread of there own. And if you want me to take a look at it place the link to that question in one of the questions I have already taken part in like this one so that an email is sent to me. This will allow others to participate and will not leaving you hanging if I am not around. As I stated before it's not a problem so post your questions.


     


    Fernando (MCSD)

    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

  • Sunday, February 19, 2012 11:12 PM
     
      Has Code

    Another issue i have come across, as I mentioned before i'm looping through a number of csv files that are poorly made :(, anyway the process breaks at a certain line because the headers are broken up in the file like this for some reason, see below:

    Reference,Product,Description,Style,"Ship 
    Date","Expiry 
    Date",Buy/Sell,Do/Dont,CCY,Origin,"No.of 
    Parts",Price,"Manu. Price 
    (%)",Evaluation,"Evaluation 
    Comments"
    
    L5208516,Hojo,lkjlkj,N/A,22-Jul-2011,24-Jul-2013,Sell,N/A,RUN,"68.00","68.00",0.00,0.68,"-46.76",30-Dec-2011
    Note: The above

            using (TextFieldParser fieldParser = new TextFieldParser(this.Variables.RBSCSVFile))
                {
                string[] ignoreLines = {"COB", "\"All\"", "\"ABC", "Company",
                                "Statement", "Client", "\"Please",
                                "THE", "Group:", "Reference,Product,Description",
                                       "Note:","Total","Date\"","Parts\"","(%)\"",
                                       "Statement", "Please", "Reference,Product,Trade", "Comments\""};
    
                fieldParser.TextFieldType = FieldType.Delimited;
                fieldParser.Delimiters = new string[] { "," };
                fieldParser.HasFieldsEnclosedInQuotes = true;
                fieldParser.CommentTokens = ignoreLines;
    
                string[] fields;
    Even though i have the second code block trying to ignore crazy lines like line 9, its still giving me that kind of error

    any help on how this can be parsed through correctly, the error that i'm getting at the moment is this...

    Line 9 cannot be parsed using the current Delimeters. Line 9 is this line (Reference,Product,Description,Style,"Ship)



  • Monday, February 20, 2012 12:03 AM
     
     

    Hi youranonymous2727;

    You need to open a new question in a new thread and if you want me to look at it post the link in a question you had open and which I was a participant in so that I am contacted and a new question gets open.

    The reason why it is failing on each of the following lines is that there is an opening, closing or a opening and closing quote without a matching opening or closing quote.

    Reference,Product,Description,Style,"Ship
    Date","Expiry
    Date",Buy/Sell,Do/Dont,CCY,Origin,"No.of
    Parts",Price,"Manu. Price
    (%)",Evaluation,"Evaluation
    Comments"

    Let me ask a question: Does the lines that have data in them always have the same number of fields in them? Because if they do you can test to see if they have the correct number of fields before processing.

     


    Fernando (MCSD)

    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

  • Monday, February 20, 2012 12:07 AM
     
     
    Ok, I'll open a new question in a new thread. I'll post the link in my next reply, to answer your question, no the lines of data never have the same number of fields in them, because there are sometimes different product groups which have more or less fields of data.
  • Monday, February 20, 2012 12:12 AM