locked
best way to read pipe delimited text file?

    Question

  • I am trying to read in a pipe delimited file and do certain processing for each field.  I was just wondering what the best way to strip out each field is - is there anything built in .Net that will easily allow me to do this?  Or do I just loop through and remove each field myself?

    Also, would it be best to strip each field out into a dataset first, and then do the processing?  Or, do the processing while I am separating each field?  Or does it not matter?

    Thanks!

    Saeed

     

    Wednesday, October 11, 2006 7:50 PM

Answers

  • it really depends. You can split the entire lines read into a string[] array if you like:

    string[] theLines;

    StreamReader theReader = new StreamReader("file.txt");

    theLines = theReader.ReadToEnd().Split(new char[] { '|' }, StringSplitOptions.RemoveEmptyEntries);

    theReader.Close();

     

    does this help?

     

    if your file has the same amount of columns and each line has a pipe symbol as a delimeter, you can read each line into the string array and add it to a datatable if you like and do stuff with it. Example:

     

    DataTable theDataTable = new DataTable();

    using (StreamReader sr = new StreamReader("file.txt"))

    {

       while(sr.Peek() > -1)

       {

          object[] currentLine = sr.ReadLine().Split(new char[] {'|'});

          theDataTable.Rows.Add(currentLine);

       }

    }

     

    so now each line of the file has been split into the columns (assuming same number of columns) and inserted into a datatable

    Wednesday, October 11, 2006 7:59 PM
  • I keep seeing people using Peek() to determine if they are at the end of a stream.

    This is not a good idea, since it will only work with random access streams - it will fail with sequential streams such as encrypting/decrypting and compressing/uncompressing streams.

    Fortunately, there is a way to do it without requiring that the stream be random access:



    using (StreamReader sr = new StreamReader("file.txt"))
    {
        for (;;)  // This loop has only one exit.
        {
            string line = sr.ReadLine();

            if (line == null)
                break;

            string[] splitLines = line.Split(new char[] { '|' });

            // ... whatever else needs to be done.
        }
    }

     


    I've seen several problems in production code because a programmer wrote a supposedly general-purpose stream utility which worked fine up until the time someone passed a stream to it that didn't support random-access, at which point it blows up.

    Thursday, October 12, 2006 8:53 AM
  • :-) yes you would! so you can create the columns on the fly once by looking at the number of items split into the array. Example:

    //while loop

          //split code here then.....

       if (theDataTable.Columns.Count = 0)

       {

          for (int counter = 0; counter < currentLine.Length; counter++)

          {

             theDataTable.Columns.Add()

          }

       }

           theDataTable.Rows.Add(currentLine);

       //.....

    //end while

    Thursday, October 12, 2006 7:27 PM

All replies

  • it really depends. You can split the entire lines read into a string[] array if you like:

    string[] theLines;

    StreamReader theReader = new StreamReader("file.txt");

    theLines = theReader.ReadToEnd().Split(new char[] { '|' }, StringSplitOptions.RemoveEmptyEntries);

    theReader.Close();

     

    does this help?

     

    if your file has the same amount of columns and each line has a pipe symbol as a delimeter, you can read each line into the string array and add it to a datatable if you like and do stuff with it. Example:

     

    DataTable theDataTable = new DataTable();

    using (StreamReader sr = new StreamReader("file.txt"))

    {

       while(sr.Peek() > -1)

       {

          object[] currentLine = sr.ReadLine().Split(new char[] {'|'});

          theDataTable.Rows.Add(currentLine);

       }

    }

     

    so now each line of the file has been split into the columns (assuming same number of columns) and inserted into a datatable

    Wednesday, October 11, 2006 7:59 PM
  • I keep seeing people using Peek() to determine if they are at the end of a stream.

    This is not a good idea, since it will only work with random access streams - it will fail with sequential streams such as encrypting/decrypting and compressing/uncompressing streams.

    Fortunately, there is a way to do it without requiring that the stream be random access:



    using (StreamReader sr = new StreamReader("file.txt"))
    {
        for (;;)  // This loop has only one exit.
        {
            string line = sr.ReadLine();

            if (line == null)
                break;

            string[] splitLines = line.Split(new char[] { '|' });

            // ... whatever else needs to be done.
        }
    }

     


    I've seen several problems in production code because a programmer wrote a supposedly general-purpose stream utility which worked fine up until the time someone passed a stream to it that didn't support random-access, at which point it blows up.

    Thursday, October 12, 2006 8:53 AM
  • perhaps, but for this its ok.
    Thursday, October 12, 2006 3:55 PM
  • When trying to insert into a datatable, I am getting the following error:

    Input array is longer than the number of columns in this table.

    Is there something else I need to do before hand? 

    Also, besides the processing, I want to insert the fields into a SQL table, once they are separated.  Can I insert one record into the SQL table for every line in the text file using the datatable?

    Thursday, October 12, 2006 7:06 PM
  • yes you can and this is what it does...you are inserting 1 row of data at a time in the datatable, so when you update the database it will update it with those rows. Now, to me, it seems as if you have less columns in SQL database or datatable table than the columns in the text file - is this correct? As stated, make sure that the number of columns in SQL/datatable MATCHES the number of columns in the textfile or after, when its split into the string[] array, check the number of columns by say, printing out the contents to the user (MessageBox) of the string that has been split.
    Thursday, October 12, 2006 7:12 PM
  • After doing more research on the syntax (which is new to me, so sorry for the basic questions!), I think I need to add the actual columns to my datatable before adding rows to them, correct?  I was trying to avoid doing this as there are 180 columns.  But if that is the only way, I guess I'll have to...

    Thursday, October 12, 2006 7:13 PM
  • :-) yes you would! so you can create the columns on the fly once by looking at the number of items split into the array. Example:

    //while loop

          //split code here then.....

       if (theDataTable.Columns.Count = 0)

       {

          for (int counter = 0; counter < currentLine.Length; counter++)

          {

             theDataTable.Columns.Add()

          }

       }

           theDataTable.Rows.Add(currentLine);

       //.....

    //end while

    Thursday, October 12, 2006 7:27 PM
  • ok - got it - thanks!

     

    Thursday, October 12, 2006 7:29 PM
  • But ahmedilyas, as you said, its best to follow best practices. Just like == and Equals() you argued before it was "best practice". I don't think it should be any different for this thread.
    Thursday, October 12, 2006 9:16 PM
  • what are you referring to?
    Thursday, October 12, 2006 11:57 PM
  • Nevermind.
    Friday, October 13, 2006 1:20 AM
  •  ahmedilyas wrote:
    perhaps, but for this its ok.


    It's important to get into the habit of doing things right all the time, not only when it's the only way the code will work!

    Besides which, Peek() and Read() is less efficient (it does extra work which is unneccessary).
    Friday, October 13, 2006 9:12 AM
  • I know this isn't related to the original question, but which is better to use: == or Equals() ?

     

    Friday, October 13, 2006 1:58 PM
  • it depends. Equals() is usually to check the equality of an object whilst the == is a reference type checker. Try doing a forum search, there are a couple of topics about this.
    Friday, October 13, 2006 2:50 PM
  • I am inserting fine into tables with exactly the same number of columns as the text file.  But I am getting bad data being inserted into tables with identity fields that are auto-incrementing.  What is the best way to account for these type of fields?

    To be a little more clear, I am currently using WriteToServer because this seemed like the quickest way to insert large amounts of data into SQL tables.  I was trying to avoid writing a sql statement with all 180 fields in the insert clause, but perhaps this is what needs to be done?

    Thanks!

    Saeed

    Friday, October 13, 2006 9:58 PM