locked
Import Clipboard from Excel File RRS feed

  • Question

  • Hi,

    I am copying with the clipboard (Ctl + C) a complete Excel Sheet to my application (to a DatagridView). I am copying from the clipboard using GetText and splitting in string arrays using "\r\n":

                    string[] cb = Clipboard.GetText(TextDataFormat.Text).Split(new string[1] { "\r\n" }, StringSplitOptions.None);
                    cb[0] = "#\t" + cb[0];
                    string[] ColHeaders = cb[0].Split('\t');
    
    
                    DataTable dTab = new DataTable();
                    foreach (string HeaderStr in ColHeaders)
                    {
                        dTab.Columns.Add(HeaderStr);
                    }
    
    

    Later I split every row to cells using "\t" separator (before I created the columns with the first row which has all column names):

                    //add rows
                    for (int i = 0; i < cb.Length - 2; i++)
                    {
                          dTab.Rows.Add(cb[i + 1].Split(new string[1] { "\t" }, StringSplitOptions.None));
                    }
    
    

    The problem is that sometimes the source excel sheets has some extra "\t" in the rows, that does not correspond to empty columns, therefore I get an error saying that the "input array is longer than the number of columns" of course, it happens only in few rows (not all rows). Excel rows in the MS Excel 2003 looks fine until I import them I find out that there are some hidden "\t" tabs.

    I cannot use "StringSplitOptions.RemoveEmptyEntries" because sometimes I have empty columns in those Excel Sheets and I get a mess.

    QUESTION:
    Is there a better way to import Excel sheet clipboard, columns, rows to a C# application? instead of using "\t", "\n" or "\r"?
     
    For example if I paste to a Word document the table is created very nicely, maybe C# clipboard has support for that kind of paste

    Thanks

    Friday, September 18, 2009 4:29 PM

Answers

  • Hi, it looks that the software that created the excel files is kind of old and it was creating excel files for MS Office 97 or 2002. Thats why these files have some kind of unicode code text "hidden" inside the cells when opened in MS office 2003.

    SOLUTION (in case that someone gets into the same problem): 
    so I needed to parse the RTF string, looking only for the table, it is to say, I am not looking for colors nor cell formats such as bold, italics or alignment attributes, etc, you may need to modify the method that I made to get those attributes for the table.

    I needed to refer to "Microsoft Office Word 2003 Rich Text Format (RTF) Specification, version 1.8" that can be downloaded from Microsoft webpage:

    So, what I basically did is to first look for rows, according to the "RTF Specification 1.8" the rows starts with \trowd command and ends with \row 
    once that I get each row I looked for cells using the \cell command and I used other method "PurgeRtfCmds" to get rid of the rest of the RTF commands.

    The method basically receives a RTF string and returns a DataTable object.

                dataGridView1.Columns.Clear();
                DataTable DtTbl = GetTableFromRtfString(Clipboard.GetData(DataFormats.Rtf).ToString());
                dataGridView1.DataSource = DtTbl;


    The GetTableFromRtfString function:

            DataTable GetTableFromRtfString(string StrRtfData)
            {
                int idxRowEnd = 0;
                int idxRowStart = 0;
                DataTable DtTbl =  new DataTable();
                bool FirstRow = true;
    
                do
                {
                    idxRowEnd = StrRtfData.IndexOf(@"\row", idxRowEnd, StringComparison.OrdinalIgnoreCase);
                    if (idxRowEnd < 0) break;
                    else if (StrRtfData[idxRowEnd - 1] == '\\') { idxRowEnd++; continue; }
                    idxRowStart = StrRtfData.LastIndexOf(@"\trowd", idxRowEnd, StringComparison.OrdinalIgnoreCase);
                    if (idxRowStart < 0) break;
                    else if (StrRtfData[idxRowStart - 1] == '\\') { idxRowEnd++; continue; }
    
                    string RowStr = StrRtfData.Substring(idxRowStart, idxRowEnd - idxRowStart);
                    idxRowEnd++;
    
                    int idxCell = 0;
                    int idxCellMem = 0;
                    List<string> MyDataRow = new List<string>();
                    do
                    {
                        idxCell = RowStr.IndexOf(@"\Cell ", idxCell, StringComparison.OrdinalIgnoreCase);
                        if (idxCell < 0) break;
                        else if (RowStr[idxCell - 1] == '\\') { idxCell++; continue; }
    
                        MyDataRow.Add(PurgeRtfCmds(RowStr.Substring(idxCellMem, idxCell - idxCellMem)));
                        idxCellMem = idxCell;
                        idxCell++;
                    }
                    while (idxCellMem > 0);
    
                    if (FirstRow)
                    {
                        FirstRow = false;
                        foreach (string ColName in MyDataRow)
                            DtTbl.Columns.Add(ColName);
                    }
                    else
                        DtTbl.Rows.Add(MyDataRow.ToArray());
                }
                while ((idxRowStart > 0) && (idxRowEnd > 0));
    
                return DtTbl;
            }
    

    And don't forget the purge method:

            string PurgeRtfCmds(string StrRtf)
            {
                int idxRtfStart = 0;
                int idxRtfEnd = 0;
    
                while (idxRtfStart < StrRtf.Length)
                {
                    idxRtfStart = StrRtf.IndexOf('\\', idxRtfStart);
                    if (idxRtfStart < 0) break;
                    if (StrRtf[idxRtfStart + 1] == '\\') 
                    { 
                        StrRtf = StrRtf.Remove(idxRtfStart, 1);   //1 offset to erase space
                        idxRtfStart++; //sckip "\"
                    }
                    else
                    {
                        idxRtfEnd = StrRtf.IndexOf(' ', idxRtfStart);
                        if (idxRtfEnd < 0)
                            if (StrRtf.Length > 0)
                                idxRtfEnd = StrRtf.Length - 1;
                            else
                                break;
                        StrRtf = StrRtf.Remove(idxRtfStart, idxRtfEnd - idxRtfStart + 1);   //1 offset to erase space
                    }
    
                }
    
                //Erase spaces at the end of the cell info.
                if (StrRtf.Length > 0)
                    while (StrRtf[StrRtf.Length - 1] == ' ')
                        StrRtf = StrRtf.Remove(StrRtf.Length - 1);
    
                //Erase spaces at the beginning of the cell info.
                if (StrRtf.Length > 0)
                    while (StrRtf[0] == ' ')
                        StrRtf = StrRtf.Substring(1, StrRtf.Length - 1);
    
                return StrRtf;
            }





    • Marked as answer by Kikeman Sunday, September 20, 2009 12:37 AM
    Sunday, September 20, 2009 12:37 AM

All replies

  • C# knows nothing about the data format with which Excel stores into the clipboard.
    You could first inspect the result of your Split method call and be sure it's the correct data you want before adding it to your data table's rows.
    Ideally however, you should use excel interop and read directly from the particular worksheet/range/cell that holds the data you need. This would give you total control.
    • Marked as answer by Kikeman Friday, September 18, 2009 6:02 PM
    • Unmarked as answer by Kikeman Friday, September 18, 2009 9:49 PM
    Friday, September 18, 2009 4:44 PM
  • yes but, with interop I think I would need to save the Excel file before, in the current scenario I have only an open unsaved Excel file, I don't want to save the Excel file, actually this is the main idea to avoid saving Excel files and only coping and pasting the info inside and get rid of those files.
    Friday, September 18, 2009 4:50 PM
  • well I have being searching around and it seems that it could be done using the rich text format:

    String tempClipboardData = Clipboard.GetData(DataFormats.Rtf).ToString();

    The string shows something funny:

    \ql \f0\fs20 \cf Order\cell \ql Cost center\cell \ql Order type\cell \ql Short text\cell

    So maybe I would need to parse this info or maybe there are some existing functions ... 

    Friday, September 18, 2009 9:53 PM
  • Hi, it looks that the software that created the excel files is kind of old and it was creating excel files for MS Office 97 or 2002. Thats why these files have some kind of unicode code text "hidden" inside the cells when opened in MS office 2003.

    SOLUTION (in case that someone gets into the same problem): 
    so I needed to parse the RTF string, looking only for the table, it is to say, I am not looking for colors nor cell formats such as bold, italics or alignment attributes, etc, you may need to modify the method that I made to get those attributes for the table.

    I needed to refer to "Microsoft Office Word 2003 Rich Text Format (RTF) Specification, version 1.8" that can be downloaded from Microsoft webpage:

    So, what I basically did is to first look for rows, according to the "RTF Specification 1.8" the rows starts with \trowd command and ends with \row 
    once that I get each row I looked for cells using the \cell command and I used other method "PurgeRtfCmds" to get rid of the rest of the RTF commands.

    The method basically receives a RTF string and returns a DataTable object.

                dataGridView1.Columns.Clear();
                DataTable DtTbl = GetTableFromRtfString(Clipboard.GetData(DataFormats.Rtf).ToString());
                dataGridView1.DataSource = DtTbl;


    The GetTableFromRtfString function:

            DataTable GetTableFromRtfString(string StrRtfData)
            {
                int idxRowEnd = 0;
                int idxRowStart = 0;
                DataTable DtTbl =  new DataTable();
                bool FirstRow = true;
    
                do
                {
                    idxRowEnd = StrRtfData.IndexOf(@"\row", idxRowEnd, StringComparison.OrdinalIgnoreCase);
                    if (idxRowEnd < 0) break;
                    else if (StrRtfData[idxRowEnd - 1] == '\\') { idxRowEnd++; continue; }
                    idxRowStart = StrRtfData.LastIndexOf(@"\trowd", idxRowEnd, StringComparison.OrdinalIgnoreCase);
                    if (idxRowStart < 0) break;
                    else if (StrRtfData[idxRowStart - 1] == '\\') { idxRowEnd++; continue; }
    
                    string RowStr = StrRtfData.Substring(idxRowStart, idxRowEnd - idxRowStart);
                    idxRowEnd++;
    
                    int idxCell = 0;
                    int idxCellMem = 0;
                    List<string> MyDataRow = new List<string>();
                    do
                    {
                        idxCell = RowStr.IndexOf(@"\Cell ", idxCell, StringComparison.OrdinalIgnoreCase);
                        if (idxCell < 0) break;
                        else if (RowStr[idxCell - 1] == '\\') { idxCell++; continue; }
    
                        MyDataRow.Add(PurgeRtfCmds(RowStr.Substring(idxCellMem, idxCell - idxCellMem)));
                        idxCellMem = idxCell;
                        idxCell++;
                    }
                    while (idxCellMem > 0);
    
                    if (FirstRow)
                    {
                        FirstRow = false;
                        foreach (string ColName in MyDataRow)
                            DtTbl.Columns.Add(ColName);
                    }
                    else
                        DtTbl.Rows.Add(MyDataRow.ToArray());
                }
                while ((idxRowStart > 0) && (idxRowEnd > 0));
    
                return DtTbl;
            }
    

    And don't forget the purge method:

            string PurgeRtfCmds(string StrRtf)
            {
                int idxRtfStart = 0;
                int idxRtfEnd = 0;
    
                while (idxRtfStart < StrRtf.Length)
                {
                    idxRtfStart = StrRtf.IndexOf('\\', idxRtfStart);
                    if (idxRtfStart < 0) break;
                    if (StrRtf[idxRtfStart + 1] == '\\') 
                    { 
                        StrRtf = StrRtf.Remove(idxRtfStart, 1);   //1 offset to erase space
                        idxRtfStart++; //sckip "\"
                    }
                    else
                    {
                        idxRtfEnd = StrRtf.IndexOf(' ', idxRtfStart);
                        if (idxRtfEnd < 0)
                            if (StrRtf.Length > 0)
                                idxRtfEnd = StrRtf.Length - 1;
                            else
                                break;
                        StrRtf = StrRtf.Remove(idxRtfStart, idxRtfEnd - idxRtfStart + 1);   //1 offset to erase space
                    }
    
                }
    
                //Erase spaces at the end of the cell info.
                if (StrRtf.Length > 0)
                    while (StrRtf[StrRtf.Length - 1] == ' ')
                        StrRtf = StrRtf.Remove(StrRtf.Length - 1);
    
                //Erase spaces at the beginning of the cell info.
                if (StrRtf.Length > 0)
                    while (StrRtf[0] == ' ')
                        StrRtf = StrRtf.Substring(1, StrRtf.Length - 1);
    
                return StrRtf;
            }





    • Marked as answer by Kikeman Sunday, September 20, 2009 12:37 AM
    Sunday, September 20, 2009 12:37 AM
  • This is excellent, and has saved me a load of time to do the same thing as you. Good work, and thanks for sharing.
    Wednesday, April 25, 2012 8:42 PM