none
ImportMixedTypes and TypeGuessRows while Importing .CSV File using ADO.NET

    Question

  • Hi,

    I'm writing client application in which i need to import .CSV file into grid. For that i need to consider all column contents as Text. To achieve this, i need to either change the registry key OR i need to generate schema file. But in both the case i need to have admin rights (ADMIN rights to modify the key OR ADMIN rights to the path where i have .CSV file). But not all the user have Admin rights.

    Is there any workaround for this.

    Thanks.

     
    • Edited by RM123 Tuesday, June 30, 2009 7:17 AM
    Monday, June 29, 2009 6:34 AM

All replies

  • Another option is to parse the file using your own code. If it is predefined structure, it might be not that difficult. You could open the file using standard .NET Framework classes, read data line-by-line and split it based on rules you have for this file


    Val Mazur (MVP) http://www.xporttools.net
    Monday, June 29, 2009 10:23 AM
    Moderator
  • Hi,
    Thanks very much for the reply.

    There are various other reasons why i go with ADO.NET.

    1. I need to validate the column from the .csv file with the predefined columns in my Application. (Say for Example, I'm importing .csv file with columns "A", "B", "D" and i have the predefined structure in my application in string array,like "A", "B", "C". In this case, i need to dispaly the error message "In Valid Column Name").

    I'm doing this as mentioned below.

    using (OleDbDataReader rdr = cmd.ExecuteReader())
    {
    string[] xlcolumns = new string[rdr.FieldCount];
    //ColSettings -> Predefined structure.
    if (defineColumnSettings(xlcolumns, ColSetting, excelProperties) == false)
    {
    //defineColumn settings tells whether xlcolumns and predefined columns are the same.
    }

    }


    2. There may the possiblity that user may import the file with different column order with the predefined structure. (Say for example, user may import .csv file with columns "C", "B", "A". In this case i need to map the column with the predefined structure like 1st column in .csv file matched to 3 rd column and so on .. This is to validate the data while importing the file).

    3. Performance

    Excel Automation can do either with Primary Interop Assemblies OR ADO.NET or simply by using File Streams to read the data.
    Since ADO.NET has better performance and that suits my requirement well, i go with ADO.NET

    Now, is there any way to say to the engine that "All columns should be considered as text."

    Note: Please note that user may not have ADMIN rights so i wont be able to create schema.ini file or i can change the registry to do so.

    Is there anything i can set in Extended propertied in the connection string so that it takes all the columns as "TEXT" instead of "MAJORITY TYPES".


    Thanks.

    Monday, June 29, 2009 12:38 PM
  • Hi,
    After googling, i found that i need to add

    MaxScanRows=0;IMEX=1; in the extended property. But still it is not working.

    Please anyone help me on this.

    Thanks in Advance..

    Monday, June 29, 2009 2:59 PM
  • Have you tried using the TextFieldParser Class?

            Dim TextFileReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("C:\Documents and Settings\...\My Documents\My Database\Text\Orders.txt")
    
            TextFileReader.TextFieldType = FileIO.FieldType.Delimited
            TextFileReader.SetDelimiters(",")
    
            Dim CurrentRow As String()
            While Not TextFileReader.EndOfData
                Try
                    CurrentRow = TextFileReader.ReadFields()
                    'Dim CurrentField As String
                    Console.WriteLine(CurrentRow(0).ToString)
                    Console.WriteLine(CurrentRow(1).ToString)
                    Console.WriteLine(CurrentRow(2).ToString)
                    Console.WriteLine(CurrentRow(3).ToString)
                    'For Each CurrentField In CurrentRow
                    ' Console.Write(CurrentField & Space(1))
                    'Next
                Catch ex As _
                Microsoft.VisualBasic.FileIO.MalformedLineException
                    MsgBox("Line " & ex.Message & _
                    "is not valid and will be skipped.")
                End Try
                Console.WriteLine()
            End While

    Paul ~~~~ Microsoft MVP (Visual Basic)
    Monday, June 29, 2009 3:04 PM
  • Hi Paul,

    Thanks very much for the reply.

    I will try with TextFileReader. But i'm not sure about the performance when i tried to read more than 50,000 records, and i'm not sure whether i would be able to validate the columns as i mentioned in the above thread.

    Anyhow, it will be really helpful if i can do it with OLEDB connections settings?

    out of curiousity i just want to know, why Microsoft didn't expose those proerties in the OLEDB connection string if it is not possible to set these properties in the OLEDB connection string.


    It is really meaningful to rely upon Schema.ini file if i need to specify each column information (that too if there is any overload method to specify from which path the schema file needs to be considered, then i can create the schema.ini file in some temp folder in the local machine where i don need to have ADMIN rights). But to specify some common properties like "MAXSCANROWS, IMPORTMIXEDTYPE" why should i rely on schema properties?

    I almost done with importing of data (all validation of columns and data) except that i need to set the MAXSCANROWS as "0" in my connection string. how to set this?

    Thanks in Advance..

    Monday, June 29, 2009 4:35 PM
  • IMEX is for the Excel files. If you do not specify types of columns than all the values by-default treated as strings, so you just need do nothing in this case. The only thing you need to avoid using schema.ini file is to specify delimiter in your connection string. Your Extended Properties part of the connection string should look like

    Extended Properties="Text;FMT=Delimited";

    In this case you do not need Schema.ini file

    Val Mazur (MVP) http://www.xporttools.net
    Tuesday, June 30, 2009 10:22 AM
    Moderator
  • Val Mazur,

    Thanks for the reply.

    Please find the connection string i used to import .csv file.
    FileInfo file = new FileInfo(filepath);
                    //using (FileStream filestr = new FileStream(file.DirectoryName + "\\schema.ini", FileMode.Create, FileAccess.Write))
                    //{
                    //    using (StreamWriter writer = new StreamWriter(filestr))
                    //    {
                    //        writer.WriteLine("[" + file.Name + "]");
                    //        writer.WriteLine("ColNameHeader=True");
                    //        writer.WriteLine("Format=CSVDelimited");
                    //        writer.WriteLine("MaxScanRows=0");
                    //        //For loop from here.
                    //        //for (int colSett = 0; colSett < ColSetting.Length; colSett++)
                    //        //{
                    //        //    writer.WriteLine("Col" + (colSett + 1) + "=\"" + ColSetting[colSett].label + "\" Text");
                    //        //}
                    //        writer.Close();
                    //        writer.Dispose();
                    //    }
                    //    filestr.Close();
                    //    filestr.Dispose();
                    //}
    string oledbCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" + file.DirectoryName + "\"; Extended Properties='text;HDR=Yes;FMT=Delimited(,)'";

    Even now it is not working. Please find the data i load mentioned below.

    1234567
    1234567
    1234567
    1234567
    1234567
    1234567
    1234567
    1234567
    1234567
    1234567
    abcdes
    1234567

    here, abcdes is not being loaded.

    Thanks

    Tuesday, June 30, 2009 11:00 AM
  • will anyone please help me on this.
    Thanks in Advance
    Wednesday, July 01, 2009 6:40 AM
  • Did my suggestion of using the TextFileReader not work out for you or did you decide to stick with using the Jet OLEDB Provider/Text ISAM and schema.ini file?

    BTW, if you need to modify your schema.ini file on-the-fly you may want to use the IniReader/Writer class:

    http://www.mentalis.org/soft/class.qpx?id=6


    Paul ~~~~ Microsoft MVP (Visual Basic)
    Wednesday, July 01, 2009 12:15 PM
  • Hi Paul,

    Thanks so much for the reply.

    I want to stick with OLEDB provider. But my problem is, i cannot create schema.ini file because many user doesn't have the permission to create file in the path where .csv file is there. Moreover, user does not have Admin rights to change the change the registry key.

    Its working perfectly allright if the user has the rights to create schema.ini file in the path where the .csv file is there. BUT WHAT IF THE USER DOESN'T HAVE ADMIN RIGHTS? (Many of my clients doent have admin rights)

    Hope you got my question

    Thanks

    Wednesday, July 01, 2009 12:41 PM
  • Can you give them permissions just to the folder where the csv file is located, or, copy the schema.ini file when the csv file is copied. Another option would be to copy the csv file to one of the user's profile folders (like My Documents).

    There is no way to supply all of these arguments in the connection string.
    Paul ~~~~ Microsoft MVP (Visual Basic)
    Wednesday, July 01, 2009 1:19 PM
  • Hi Paul,
    Thanks for the reply. I've already tried with copying files in some temporary location. But anyhow i just want to know whether it is possible to add atleast "MAXSCANROWS" in the connection string.

    Thanks once again for helping me on this.
    Wednesday, July 01, 2009 4:18 PM
  • I've never seen an instance where MAXSCANROWS actually worked from the connection string.
    Paul ~~~~ Microsoft MVP (Visual Basic)
    Wednesday, July 01, 2009 5:12 PM
  • Paul,

    Is there any specific reason why Microsoft didnt expose these properties in the connection string?
    Simply, why didn't Microsoft expose all the properties available in the Registry Key in the connection string? (Like ImportMixedType, MaxscanRows etc..)

    Thanks
    Friday, July 03, 2009 6:41 AM
  • I believe Extended Properties values in your connection string should be enclosed into double, not single quotes, someling like

    string oledbCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" + file.DirectoryName + "\"; Extended Properties=\"text;HDR=Yes;FMT=Delimited(,)\"";

    It might not work because it it
    Val Mazur (MVP) http://www.xporttools.net
    Friday, July 03, 2009 11:49 AM
    Moderator
  • val mazur,
    thanks for the reply.
    I tried this but its not working.

    Paul,
    you are correct. No where its working. My question may be silly, but i'm very curious to know the reason why Microsoft didnt expose these properties in the connection string?

    Thanks

    Friday, July 03, 2009 6:44 PM
  • I don't know why Microsoft chose not to support more configuration parameters in the connection string. I can only assume it was because all of this information could be specified in a schema.ini file.
    Paul ~~~~ Microsoft MVP (Visual Basic)
    Friday, July 03, 2009 9:37 PM
  • I posted to Jet team (internally here at MSFT) to see if they have an answer for this one.  I suspect if you set MaxScanRows to 0 or 1 what will happen is the driver will just use the first value it finds and if this is a number in the column you get a numeric column.  

    Probably the reason why the Jet team decided to put all this metadata into schema.ini file is because if you inlined it into a connection string you would have a really huge nasty connection string.   Abstracting the metadata to schema.ini provides a nice clean way to provide metadata for text files without having to have a huge connection string and without having to modify the text file to add some header, etc...

    Text files simply do not have an agreed upon inline metadata schema (like dbf etc...) so the metadata has to be provided in some fashion.

    Saturday, July 04, 2009 5:18 PM
    Moderator
  • Thanks for the replies..

    Matt, I agree upon what you are saying that metadata has to be provided in some fashion. But still, please let me know when you get any response from the Jet team, any possible solution for this.

    Thanks.
    Monday, July 06, 2009 5:27 AM
  • One thing you can try that might work is setting MaxScanRows=0, according to the article below, this will force driver to scan all rows before determining data type:

    http://support.microsoft.com/kb/282263

    This is because setting MaxScanRows to zero causes the entire table to be scanned in order to determine the data type of the fields. When this happens, the data type is set according to the majority type in the field.
    Monday, July 06, 2009 5:34 PM
    Moderator
  • Matt,
    I refered the link which you provide and in that they have mentioned to create schema.ini file in the resolution section. But it wont help me.

    Here is my prob:
    I'm writing client application in which i need to import .CSV file into grid. For that i need to consider all column contents as Text. To achieve this, i need to either change the registry key OR i need to generate schema file. But in both the case i need to have admin rights (ADMIN rights to modify the key OR ADMIN rights to the path where i have .CSV file). But not all the user have Admin rights.

    What to do for that.

    [i want to show you the screen shot but i don know how to do that in this forums. please let me is that possible..]

    Thanks..
    Thursday, July 09, 2009 9:36 AM
  • Just add MaxScanRows=0 to the connection string and this should solve your problem, you don't need to modify schema.ini or change registry key.
    Thursday, July 09, 2009 6:45 PM
    Moderator
  • Matt,

    i have not created any schema file OR change the registry key. I just add "MaxScanRows=0" in the connection string. Its not working.

    Please find the data i tried to import.

    COLUMNA COUMNB







    8 734566
    9 MATFX

    "MATFX" in column B has not been imported.
    Please provide me the sample code if its working for you.

    Thanks

     

     

    Friday, July 10, 2009 9:01 AM
  • Hi RM,
    did you get an answer to this problem?

    I am currently hitting a similiar issue, my users wants to be able to import data as it is in the .csv file
    but .net continues to treat $25 as a decimal when I need it to be a string.

    I have tried everything, and the only option was to encapsulate the data for the cost column in my csv with "" ie. "$25" obviously this is a tad naff, as I cant expect the user to complete this if there are 20,000 plus rows.

    so did you get any further with your response from the Jet team, and if so would the answer help me

    regards
    Dogs_Bollox

    Monday, July 13, 2009 3:57 PM
  • You need to define a schema.ini file for your .csv file.
    Paul ~~~~ Microsoft MVP (Visual Basic)
    Monday, July 13, 2009 4:03 PM
  • Yep, I just realizedd MaxScanRows=0 will not work because it sets the type based on the majority type in the field, which in your case is numeric.  You need to create a schema.ini or use some other methodology.
    Monday, July 13, 2009 4:53 PM
    Moderator
  • Hey Matt,

    I tried with MaxScanrows=0 which finally worked for me even though majority of the field data type is numeric. Moreover, it worked without doing any registry settings.

    Regards,

    Sabya.

    Wednesday, April 17, 2013 10:15 AM