none
How to remove characters from an excel cell in C#? RRS feed

  • Question

  • I am developing an application that use SFTP to download files from a remote server to a local server, these files are then converted to .csv format which is readable by ealge.io platform. 

    However I found out that in one of the csv columns there is a character that need to be deleted to make the file readable.

    There is a T in the middle of the time cell, this T need to be deleted to suit this format YYYY-MM-DD HH:mm:ss instead of this YYYY-MM-DDTHH:mm:ss as per eagle.io requirements. Below is the sample data from the csv data.

    Local_Time	       
    2000-03-05T03:00:59	
    2000-03-05T03:01:08
    2000-03-05T03:01:59	
    2000-03-05T03:02:59
    2000-03-05T03:03:59	
    2000-03-05T03:04:53	

    This is how it should be:

    Local_Time
    2000-03-05 03:01:00
    2000-03-05 03:01:08
    2000-03-05 03:01:59
    2000-03-05 03:03:00
    2000-03-05 03:04:00
    2000-03-05 03:04:54
    

    How to do this directly from C#?


    • Edited by samiarja Tuesday, September 3, 2019 1:37 AM add the correct format of the time
    Tuesday, September 3, 2019 1:35 AM

All replies

  • Humm, just read the file line by line, then locate the nth seperator with .IndexOfAny() (in that loop, you  skip counting if you encountered ", then resume with another "), then new a StringBuilder with the line, replace the character with

    builder[10] = ' ';

    and export the line as .ToString().

    Rinse and repeat until it's the end of file.

    It's just school assignment level of string manipulation.

    Alternatively you can also read the file as binary, then do similar search and replace logic by handling newline character(s) yourself. It's more efficient this way.



    Tuesday, September 3, 2019 1:50 AM
    Answerer
  • Hi,

    Supposeing a character "T" is located at 11th in each line, you can replace it with one space.
    Firstly, define extended method on String.
    public static class String
    {
        public static string ChangeCharAt(this string str, int index, char newChar)
        {
            return str.Remove(index, 1).Insert(index, newChar.ToString());
        }
    }
    and use it like this:
      [e.g.] inputLine is '2000-03-05T03:00:59'
      [note] index starts from zero
    inputLine.ChangeCharAt(10, ' '));
    Regards,


    Ashidacchi -- https://ssl01.rocketnet.jp/hokusosha.com/default.html


    • Edited by Ashidacchi Tuesday, September 3, 2019 1:53 AM
    Tuesday, September 3, 2019 1:51 AM
  • How to do that without actually specifying the string like that "2000-03-05T03:00:59"?

    There are around 1k+ rows with this characters and over 5K+ files, they all need to be done together.

    Tuesday, September 3, 2019 1:55 AM
  • Will this work efficiently on all the .csv file? as there are over 5k+ files each with thousands of rows with this time format
    Tuesday, September 3, 2019 1:56 AM
  • As long as the format is CSV file is the same, yes.

    Just process it line by line and write the completed line into a new file, then delete the old file when done and rename it to the old name.

    Remember to use "using" statement to wrap the file open code so the handle will be disposed as soon as it exit the block, so you can rename it.

    Btw. if you count on efficiency writing this yourself would be far more efficient then using a library. The library would at best do similar works, or more probably, if you're using a library that reads CSV format, needlessly parse all the fields that you aren't using and wasting CPU time.
    Tuesday, September 3, 2019 2:03 AM
    Answerer
  • The line "one of the csv columns" in the question make me think it has more than 1 columns, and probably couldn't do it in this simplified way.

    Btw, because of string interning, the insert + remove kind of work will create 1 more copy of temporary string, and when looping files I'll probably not doing it this way.

    Tuesday, September 3, 2019 2:07 AM
    Answerer
  • Hi,

    You can make a nested loop: outer loop is for files, inner loop for rows.
    In the outer loop, file name need to be specified.  If 5K+ files are all in the same directory, you can get all file names at once.
    // get all file names in "C:\test" into string array
    string[] files = System.IO.Directory.GetFiles(
        @"C:\test", "*.CSV", System.IO.SearchOption.AllDirectories);
    In the inner loop, read at the end of rows and replace "T" with one space.

    Regards,

    P.S.  I'd like to confirm how many columns (or items) om a row, and where the date-time value is, such as in the first column, the 4th column.

    Ashidacchi -- https://ssl01.rocketnet.jp/hokusosha.com/default.html


    • Edited by Ashidacchi Tuesday, September 3, 2019 2:26 AM
    Tuesday, September 3, 2019 2:14 AM
  • Think about the requirement again, since the time is not wrapped in string delimiters, if the "string fields" are wrapped in string delimiters (I'll assume " in here), perheps the most efficient way would be like this:

    ======

    For each file

    1) Create boolean value IsInStr and set it to false.
    2) Read the file in 4096 byte length (or the cluster size of the disk) until the end of file. If no more byte to read, exit loop and go to (5).
    3) For each byte in the read buffer, do one of the following:
    - if current byte is 0x22("), and the previous character is not escape character(say, 0x5C \ ) , toogle the IsInStr variable. and add to write buffer
    - if current byte is 0x54(T) and not IsInStr, add 0x20(space) to write buffer instead
    - for other characters, just add to write buffer
    4) Write buffer to the new file, then go back to (2).
    5) Close and Dispose the handles, delete the old file and rename the new file to old file

    ======

    This assume the file is in plain ASCII character set, if there will be characters of other code page, you'll have to add more logic (say, to only do the action if previous byte is not high-ASCII)

    Tuesday, September 3, 2019 2:44 AM
    Answerer
  • Hi samiarja,

    Thank you for posting here.

    You could convert csv to datatable and change the value in datatable then you could convert datatable to csv.

    I write a simple code, you could have a look.

      static void Main(string[] args)
            {
                string filepath = @"D:\1.csv";
                DataTable table = OpenCSV(filepath);
                foreach (DataRow item in table.Rows)
                {
                    if(item["Datetime"].ToString().Contains("T"))
                    {
                        item["Datetime"] = item["Datetime"].ToString().Replace("T", " ");
                    }
                }
                SaveCSV(table, filepath);
    
    
    
    
    
            }
            public static void SaveCSV(DataTable dt, string fullPath)
            {
                FileInfo fi = new FileInfo(fullPath);
                if (!fi.Directory.Exists)
                {
                    fi.Directory.Create();
                }
                FileStream fs = new FileStream(fullPath, System.IO.FileMode.Create, System.IO.FileAccess.Write);
                StreamWriter sw = new StreamWriter(fs, System.Text.Encoding.UTF8);
                string data = "";
                for (int i = 0; i < dt.Columns.Count; i++)
                {
                    data += dt.Columns[i].ColumnName.ToString();
                    if (i < dt.Columns.Count - 1)
                    {
                        data += ",";
                    }
                }
                sw.WriteLine(data);
                for (int i = 0; i < dt.Rows.Count; i++)
                {
                    data = "";
                    for (int j = 0; j < dt.Columns.Count; j++)
                    {
                        string str = dt.Rows[i][j].ToString();
                        str = str.Replace("\"", "\"\"");
                        if (str.Contains(',') || str.Contains('"')
                            || str.Contains('\r') || str.Contains('\n')) 
                        {
                            str = string.Format("\"{0}\"", str);
                        }
    
                        data += str;
                        if (j < dt.Columns.Count - 1)
                        {
                            data += ",";
                        }
                    }
                    sw.WriteLine(data);
                }
                sw.Close();
                fs.Close();
                Console.WriteLine("success");
            }
            public static DataTable OpenCSV(string filePath)
            {
                DataTable dt = new DataTable();
                FileStream fs = new FileStream(filePath, System.IO.FileMode.Open, System.IO.FileAccess.Read);
    
                StreamReader sr = new StreamReader(fs);
                string strLine = "";
                string[] aryLine = null;
                string[] tableHead = null;
                int columnCount = 0;
                bool IsFirst = true;
    
                while ((strLine = sr.ReadLine()) != null)
                {
    
                    if (IsFirst == true)
                    {
                        tableHead = strLine.Split(',');
                        IsFirst = false;
                        columnCount = tableHead.Length;
                        for (int i = 0; i < columnCount; i++)
                        {
                            DataColumn dc = new DataColumn(tableHead[i]);
                            dt.Columns.Add(dc);
                        }
                    }
                    else
                    {
                        aryLine = strLine.Split(',');
                        DataRow dr = dt.NewRow();
                        for (int j = 0; j < columnCount; j++)
                        {
                            dr[j] = aryLine[j];
                        }
                        dt.Rows.Add(dr);
                    }
                }
                if (aryLine != null && aryLine.Length > 0)
                {
                    dt.DefaultView.Sort = tableHead[0] + " " + "asc";
                }
    
                sr.Close();
                fs.Close();
                return dt;
            }

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, September 3, 2019 3:25 AM
    Moderator