none
Decimal changes to string after cleansing-process

    Question

  • Yesterday, I did some cleansing-process on data, where the column "capacity" is a decimal.

    After the cleansing process, I created an excel-file and in this file the "capacity" is a string.

    Why is this so? Is there a check-box I forgot to mark?

    Best regards, fab

    Tuesday, April 24, 2012 6:54 AM

Answers

  • DQS is treating the source like a Decimal it looks like, based on the data type shown in your screenshot that part probably works like you want it to.

    During the import from Excel, it detects the type... http://msdn.microsoft.com/en-us/library/hh213033(v=SQL.110).aspx

    • In .xlsx and .xls files, the data type of the column is determined by the most prevalent data type in the first eight rows. If a cell does not conform to that data type, it will be given a null value.
    • In .csv files, the data type is determined by the most prevalent data type in the first eight rows.

    During the export to Excel, all columns are kept as string. I think this is by design. When you export to SQL table, the type is also kept as varchar(max) if I recall.

    I see how this is undesirable, so if you think the product should change, please log a request at https://connect.microsoft.com/SQLServer/Feedback 

    Thanks much, Jason


    Didn't get enough help here? Submit a case with the Microsoft Customer Support team for deeper investigation - http://support.microsoft.com/select/default.aspx?target=assistance

    Tuesday, April 24, 2012 10:09 PM
    Moderator

All replies

  • DQS is treating the source like a Decimal it looks like, based on the data type shown in your screenshot that part probably works like you want it to.

    During the import from Excel, it detects the type... http://msdn.microsoft.com/en-us/library/hh213033(v=SQL.110).aspx

    • In .xlsx and .xls files, the data type of the column is determined by the most prevalent data type in the first eight rows. If a cell does not conform to that data type, it will be given a null value.
    • In .csv files, the data type is determined by the most prevalent data type in the first eight rows.

    During the export to Excel, all columns are kept as string. I think this is by design. When you export to SQL table, the type is also kept as varchar(max) if I recall.

    I see how this is undesirable, so if you think the product should change, please log a request at https://connect.microsoft.com/SQLServer/Feedback 

    Thanks much, Jason


    Didn't get enough help here? Submit a case with the Microsoft Customer Support team for deeper investigation - http://support.microsoft.com/select/default.aspx?target=assistance

    Tuesday, April 24, 2012 10:09 PM
    Moderator
  • If you want to change the number of rows that is used to determine the datatype, set the registry setting for TypeGuessRows from 8 to 0. This will make the provider scan the first 16384 rows.

    JET provider:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel

    ACE OLE DB:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel

    I sure hope the DQS team was smart enough to put IMEX=1 in the connectionstrings to Excel :)


    MCTS, MCITP - Please mark posts as answered where appropriate.

    Wednesday, April 25, 2012 8:44 AM