locked
csv not being read properly RRS feed

  • Question

  • User-14001990 posted

     Dear Friends,

     I have a small csv.

    When it contains all text it is being read by oledb.

    But if it contains numbers it is not being read properly.

    So if my csv is like this:

    OLD_CUST, NEW_CUST
    G,         H
    I,    J
    M,    N
    65464653,  644646636
    98765465,  988924212
    G,    6546546

    H.J,N are not being read.

    1)How to tackle the above problem?

    2)Is there a better provider for CSV.Excel.. something called ACE?If yes pls provide details

    3) How to restrict the number of rows to be read eg 5000 when reading using OledbAdapter?

    Thanks for your efforts.

     

     

    Thursday, December 9, 2010 10:08 AM

All replies

  • User269602965 posted

    Show us your load code.

    Then we do not have to guess about what is wrong 

    Thursday, December 9, 2010 4:27 PM
  • User-14001990 posted

    The code.

     

    string

     

     

     

     

     

     

    csvconnstr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" + file.DirectoryName + "\";Extended Properties=" + "\"text;HDR=Yes;FMT=Delimited(,)\";";OleDbConnection oledbconn = new OleDbConnection(csvconnstr);int res = 0;DataTable tbl = new DataTable();string onlyname = Path.GetFileName(flname);try

    {

     

    {

     

    (

    {

     

    {

    tbl =

    adp.Fill(tbl);

     

    using (oledbconn)using (OleDbCommand cmd = new OleDbCommand(string.Format"SELECT * FROM [{0}]", file.Name), oledbconn))using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))new DataTable("CSVTable");//foreach (DataRow row in tbl.Rows)

     

    //{

     

    // Process the current row...

     

    //}

    }

    }

    }

    res = 1;

    }

     

    {

    logmsgf[1] = ex.StackTrace;

     

    catch (Exception ex)//ff.WriteToLog(iso, "CSVDBUpload/log/CSVDBUpload_log.txt", logmsgf);

    res = ex.GetHashCode();

    }

    Thursday, December 9, 2010 11:54 PM
  • User269602965 posted

    Since you are reading in the first row as a HEADER (col names)

    try this select statement:

    SELECT TRIM(OLD_CUST) AS OLD_CUST, TRIM(NEW_CUST) AS NEW_CUST FROM [{0}]", file.Name 

    Friday, December 10, 2010 7:20 PM
  • User269602965 posted

    In answer to question #2... Yes, there are better ways.

    The problem with your method is, I suspect, if the data file is large, you will run out of memory.

    I like to load flat files into Oracle as an EXTERNAL TABLE,

    and then you can do transformations on the fly using a select statement against the External Table.

    The table size can be enormous, millions of rows, and it is FAST.

    You will have to set up an Oracle Directory object to tell Oracle about your physical location of the file.

    External table concepts here:

    http://download.oracle.com/docs/cd/B19306_01/server.102/b14215/et_concepts.htm

     

    Friday, December 10, 2010 7:31 PM
  • User-14001990 posted

    Hi Lannie,


    Although EXternal tables are good there is one problem.
    There is no scope for validation of the data.
    So if I want to check null or empty strings or the length of each column value there is no way to do it.
    Therefore I have to create another csv file based on validated data and then use that as external table.

    This is problem no 1.

    How do I make a proc which will read a specified amount of data at a time say 5000 from the External table i.e csv file?

    Then treat it as a transaction so that in the event of any error being generated it can be Rolled back.

    The benefit of this would be that only 5000 records would need to be rolled back, otherwise the entire amount of data i.e abt 1 million records would have to be rolled back in case of single transaction.

    Therefore it cannot be a single transaction when the no of records is very large.

    This is problem no 2.


    Sunday, December 12, 2010 12:37 PM
  • User269602965 posted

    Oracle external table load can do some of the validation work for you.

    For example, you define the data element as CHAR(20),

    but encounter a bad CHAR(30) in raw data,

    or you define a data element as a a DATE with a specific format,

    but encounter a bad DATE in the raw data,

    Oracle external table will kick that record(s) out into a bad record log file

    as a bad row(s), so you can continue to load the rest of the data,

    and alert you about what caused the problem in another log file.

    You can also put in criteria to DISCARD records if you have business rules

    for wanting to avoid certain data elements that may not be bad logically,

    but might be desired by you to be discarded. 

    The external table is read only, so it is never changed and does not need to be rolled back.

    +++++++++++++++++

    I guess an alternative is to write a CS.NET or VB.NET function  to read, parse, and validate each dat element,

    and then pass the row into Oracle table with ODP.NET.  That is a bit slower, but if you are low in Oracle skills,

    and high in CS.NET/ VB.NET SYSTEM.IO skills, it works just fine.

    +++++++++++++++++

    At first, when I started using Oracle External tables, it appeared to not be very powerful,

    just from using the basic functions shown in examples,

    but as I learned its capabilities and flexibility, it became a powerhouse tool for me,

    especially when handling hundreds of millions of rows.

    Sunday, December 12, 2010 4:32 PM
  • User-14001990 posted

    The external table is read only, so it is never changed and does not need to be rolled back.

     

    Hi,

    I know the external table is read only, but I am talking of rollback of the set of records that are attempted to be inserted into the normal table from the external table.

    1) Are you saying that batch insertions in the form of transactions are not supported in external tables i.e if any error occur in a batch of 5000 records the entire batch will not be rolled back.

    So what will happen if error occurs in multiple records while insertion.

    2) If batch insertions or transactions are not supported then it means that we cannot limit the no of records to be inserted at a time.If this be the case then it will go on & on till all the records are inserted.

    Then how do I stop the process of insertions if I really want to?

    3) You have not posted any code of the things u have done

    Do post them with some explanation abt the task being done.

    Monday, December 13, 2010 2:17 AM
  • User-547121016 posted

    take a look at this post

    http://forums.asp.net/t/1630892.aspx

    check your records have "," character withing the words. 

    Monday, December 13, 2010 2:21 AM
  • User269602965 posted

     If you need to stop the load completely in case of an error or violation of business rule,

    then this option gives you the greatest control and flexibility.

    Write a CS.NET or VB.NET function  to read, parse, and validate each dat element,

    and then pass the row into Oracle table with ODP.NET.

     

    If you want to load as much as you can, and kick BAD or DISCARDED records out to a log file as you go

    to manage loading after correcting errors in the bad or discarded records for later loading,

    then use the External Table Oracle load.  Oracle External tables can handle one or more

    BAD or DISCARDED record events during the million record load.

     

    My previous post with a link to Oracle documentation has good examples of code.

    But later today, I could work up a load example for your first post data set.

    Monday, December 13, 2010 3:41 PM
  • User269602965 posted

    /* Create external table for CSV file in an Oracle Directory object  */
    /* that you have read and write privileges on                        */
    /* set up log file to capture the BAD records that do not match      */
    /* defined data types                                                */

    CREATE TABLE EXTTABLE_LOAD_CUSTOMERS
    (
    OLD_CUST             VARCHAR2(16),
    NEW_CUST             VARCHAR2(16)
    )
    ORGANIZATION EXTERNAL
    (
     TYPE ORACLE_LOADER
     DEFAULT DIRECTORY YOUR_ORACLE_DIR
     ACCESS PARAMETERS
      (
       RECORDS DELIMITED BY NEWLINE
       BADFILE     'BAD_LOAD_CUST.LOG'
       DISCARDFILE 'DISCARD_LOAD_CUST.LOG'
       LOGFILE     'LOG_LOAD_CUST.LOG'
       FIELDS
         TERMINATED BY ','
         MISSING FIELD VALUES ARE NULL
       (
       OLD_CUST             VARCHAR2(16),
       NEW_CUST             VARCHAR2(16)
       )
      )
    LOCATION ('OLD_NEW_CUST.csv')
    )
    REJECT LIMIT UNLIMITED
    /

    /* We could just create a table directly off the external table               */
    /* But since you need to insert fixed bad records later,                      */
    /* We create an empty table, then insert good records from the external table */
    /* For millions of row, we use NOLOGGING and COMPRESS BASIC                   */
    /* and on insert use the APPEND hint to speed loading                         */

    CREATE TABLE OLD_NEW_CUSTOMERS NOLOGGING COMPRESS BASIC
    (
    OLD_CUST             VARCHAR2(16),
    NEW_CUST             VARCHAR2(16)
    )
    /

    INSERT /*+ APPEND */ INTO OLD_NEW_CUSTOMERS
    (
    SELECT
      TRIM(OLD_CUST) AS OLD_CUST,
      TRIM(NEW_CUST) AS NEW_CUST
    FROM
      EXTTABLE_LOAD_CUSTOMERS

    /

    COMMIT;

    Monday, December 13, 2010 9:28 PM
  • User-14001990 posted

    Hi,

    I have created a all in one stored procedure which does the following:
    1) create external directory
    2) Grant rights to the user for the above directory.
    3) Create a Extrnal table
    4) Create the normal DB table
    5) Transfer data to DB table

    but I am getting the errors in the "CREATE" sentences.
    So is DDL statements not supported in stored procs.
    If so do I have to execute them separately?

    My proc is as below:

    Create or Replace Procedure blunders_mumbai1.sp_extupload
    (
    p_extdir IN VARCHAR2,
    p_user IN VARCHAR2,
    p_nErrNo OUT NUMBER,
    p_vErrDesc OUT VARCHAR2

    )
    is
    C_SYSTEM_ERROR CONSTANT NUMBER(1) := -1;
    Begin

    CREATE OR REPLACE DIRECTORY TEST_DIR AS 'd:\mydata'; --d:\mydata to be replaced by p_extdir
    GRANT READ, WRITE ON DIRECTORY TEST_DIR TO blunders_mumbai1; --blunders_mumbai1 to be replaced by p_user

    CREATE OR REPLACE TABLE ext_table1 (
    old_cust VARCHAR2(8),
    new_cust VARCHAR2(8)
    )
    ORGANIZATION EXTERNAL (
    TYPE oracle_loader
    DEFAULT DIRECTORY TEST_DIR
    ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    BADFILE TEST_DIR:'bad-upload.bad'
    LOGFILE TEST_DIR:'log_upload.log'
    FIELDS TERMINATED BY ','
    OPTIONALLY ENCLOSED BY '"'
    MISSING FIELD VALUES ARE NULL
    REJECT ROWS WITH ALL NULL FIELDS
    (old_cust,new_cust))
    LOCATION ('datafile1.csv')
    )
    REJECT LIMIT UNLIMITED;

    CREATE OR REPLACE TABLE OLD_NEW_CUSTOMERS NOLOGGING COMPRESS BASIC
    (
    OLD_CUST VARCHAR2(8),
    NEW_CUST VARCHAR2(8)
    );

    insert /*+ APPEND / into OLD_NEW_CUSTOMERS / cust_details only columns are old_cust,new_cust */
    select TRIM(OLD_CUST) AS OLD_CUST,
    TRIM(NEW_CUST) AS NEW_CUST
    from ext_table1
    where old_cust is not null /* an empty string is treated as null by Oracle */
    and new_cust is not null
    and length(old_cust) = 8
    and length(new_cust) = 8

    COMMIT;

     

    EXCEPTION
    WHEN OTHERS THEN
    p_nErrNo := C_SYSTEM_ERROR;
    p_vErrDesc := SQLERRM;
    ROLLBACK;

    END blunders_mumbai1.sp_extupload;

    Thanks for your reply.



    Tuesday, December 14, 2010 2:20 AM
  • User269602965 posted

    http://download.oracle.com/docs/cd/E11882_01/appdev.112/e17126.pdf

     

    Read Section 7-2 on Execute Immediate for DDL in PL/SQL 

    Tuesday, December 14, 2010 2:51 PM
  • User-14001990 posted

    Hi Lannie,

    I am still far away from having a working proc.

    My proc is below:

    PROCEDURE sp_extupload
    (
    p_extdir IN VARCHAR2,
    p_user IN VARCHAR2,
    p_nErrNo OUT NUMBER,
    p_vErrDesc OUT VARCHAR2

    )
    IS
    C_SYSTEM_ERROR CONSTANT NUMBER(1) := -1;
    BEGIN
    BEGIN
    insert into temp values('222');
    commit;
    EXECUTE IMMEDIATE 'CREATE OR REPLACE DIRECTORY TEST_DIR AS'|| p_extdir;--d:\mydata to be replaced by p_extdir

    EXECUTE IMMEDIATE 'GRANT READ, WRITE ON DIRECTORY TEST_DIR TO'|| p_user;--blunders_mumbai1 to be replaced by p_user
    insert into temp values('333');

    EXECUTE IMMEDIATE q'[CREATE OR REPLACE TABLE ext_table1 (
    old_cust VARCHAR2(8),
    new_cust VARCHAR2(8)
    )
    ORGANIZATION EXTERNAL (
    TYPE oracle_loader
    DEFAULT DIRECTORY TEST_DIR
    ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    BADFILE TEST_DIR:'bad-upload.bad'
    LOGFILE TEST_DIR:'log_upload.log'
    FIELDS TERMINATED BY ','
    OPTIONALLY ENCLOSED BY '"'
    MISSING FIELD VALUES ARE NULL
    REJECT ROWS WITH ALL NULL FIELDS
    (old_cust,new_cust))
    LOCATION ('datafile1.csv')
    )
    REJECT LIMIT UNLIMITED]';
    insert into temp values('444');
    EXECUTE IMMEDIATE 'CREATE OR REPLACE TABLE OLD_NEW_CUSTOMERS NOLOGGING COMPRESS BASIC
    (
    OLD_CUST VARCHAR2(8),
    NEW_CUST VARCHAR2(8)
    )';
    insert into temp values('555');

    EXECUTE IMMEDIATE q'[insert /*+ APPEND / into OLD_NEW_CUSTOMERS / cust_details only columns are old_cust,new_cust */
    (
    select old_cust,new_cust
    from ext_table1
    where old_cust is not null /* an empty string is treated as null by Oracle */
    and new_cust is not null
    and length(old_cust) = 8
    and length(new_cust) = 8
    )]';

    COMMIT;
    insert into temp values('666');
    EXCEPTION
    WHEN NO_DATA_FOUND THEN
    NULL;
    WHEN OTHERS THEN
    NULL;
    END;

    EXCEPTION
    WHEN OTHERS THEN
    p_nErrNo := C_SYSTEM_ERROR;
    p_vErrDesc := SQLERRM;
    insert into temp values('777');
    ROLLBACK;

    END sp_extupload;

    Although it compilies without errors it does not execute.

    It executes only the first insert statement i.e 222 gets inserted into temp.
    It does not go any further.
    What is wrong and where?
    Is there any better way of debugging this other than using insert statements?
    I am using SQL developer for writing procs.

    Thanks for your reply.

    Wednesday, December 15, 2010 12:11 AM
  • User269602965 posted

    for starters

    EXECUTE IMMEDIATE 'CREATE OR REPLACE DIRECTORY TEST_DIR AS'|| p_extdir;

    missing space character after word AS

    EXECUTE IMMEDIATE 'CREATE OR REPLACE DIRECTORY TEST_DIR AS '|| p_extdir;

     

    Wednesday, December 15, 2010 3:31 PM
  • User269602965 posted

     http://st-curriculum.oracle.com/tutorial/SQLDeveloper/index.htm

     

    Oracle SQL Developer has PL/SQL debugger.

    See tutorial.

     

    Wednesday, December 15, 2010 5:03 PM
  • User269602965 posted

    declare
      strDDL VARCHAR2(2000);
    begin
      strDDL := 'create or replace directory TEMP_DIR as ''C:\TEMP''';
      execute immediate (strDDL);
      strDDL := 'grant read, write on directory TEMP_DIR to SCOTT';
      execute immediate (strDDL);
    end;


    PL/SQL procedure successfully completed. 

    You may not grant read,write on directory objects to yourself.

    Do not use EXECUTE IMMEDIATE applies to the INSERT INTO OLD_NEW_CUSTOMERS

    Wednesday, December 15, 2010 7:15 PM
  • User-14001990 posted

    Hi,

    The problem of the csv not being read properly still exists even by applying TRIM around column names.

    The string values before and after number are not being read.It is coming blank.

    The only workaround I have found is enclosing both of them by ''.

    But that involves a huge overhead especially if it a large file created from excel or other sources.

    Any ideas??

    I feel that providers like oledb odbc are pretty useless in reading csv's.


    Thursday, December 16, 2010 1:56 PM
  • User269602965 posted

    I am beginning to suspect the CSV file may be contaminated with binary garbage (non-printable characters like TABS, Printer commands, or high-byte ASCII characters).

    Open the CSV file with a Text Editor that allows switching to HEXIDECIMAL view to look for binary garbage.

    I use UltraEdit version 16.30. 

    I often encounter binary garbage in exported text flat files like TXT, DAT, CSV, etc.

    In an ordinary text editor, these hidden binary garbage characters may not appear on the screen,

    but mess up the data load into Oracle and other databases.

    ++++++

    I agree oledb and odbc are very slow and not useful to me.

     

     

    Thursday, December 16, 2010 7:25 PM