locked
Delimited text with thousand and decimal separator (fx 42.000,16) RRS feed

  • Question

  • I have a csv file where one of the columns has numbers which have dot (".") as thousand separator and comma(",") as decimal separator. 

    So far I have been using "Azure Blob Storage" connector (which I think is deprecated now), but since we now have a gateway I would like to use a "Delimited Text" connector instead. In the Blob storage connector I defined the column as this in the JSON code of the connector:

    {
    "name": "Quantity",
    "type": "Decimal",
    "culture": "da"

    }

    This works just fine, when I copy that same method into the Delimited text and run the pipeline, I get this "error: ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column 'quantity' contains an invalid value '42.026,000'."

    Any way to fix this?

    Wednesday, April 1, 2020 9:34 AM

Answers

  • Following up with my findings:

    It is possible to achieve the customer ask.  There is much manual editing of the Factory JSON involved.

    Steps:

    1. Create the initial delimited text dataset
    2. Import the schema into the dataset
    3. Edit the dataset JSON to include the correct “type” and “culture”. (details below)
    4. Save the dataset
    5. Go to the copy activity and select the delimited text dataset you created above.
    6. Go to the copy activity mapping and import schemas
    7. Check the source dataset’s schema was not changed.  Make sure it still has the manual corrections.
    8. Open the pipeline JSON and find the mapping section for the copy activity.  Make the manual changes for both source and sink. (see below)
    9. Do a debug run and check the results in your database.  I found that when I got the culture wrong on sink, the data loaded only as whole numbers.

    If you need to go back and make a correction to the dataset schema, you should follow it by “import schema” in the copy activity mapping.

    Details for step 3:

    In the schema section of the dataset JSON, change the relevant fields like:

    //BEFORE:
    {
        "name": "Local currency",
        "type": "String"
    },
    
    //AFTER:
    {
        "name": "Local currency",
        "type": "Decimal",
        "culture": "da"
    },
    

    Details for step 8:

    Add the appropriate culture to both the source and sink.  Change type if needed.
    An error on the source side culture can produce an 'unable to convert' error.
    An error on the sink side culture can result in the database interpreting the values in unexpected ways.

    //BEFORE:

    { "source": { "name": "Local currency", "type": "Decimal" }, "sink": { "name": "Local currency", "type": "Decimal"

    } },

    //AFTER: { "source": { "name": "Local currency", "type": "Decimal", "culture": "da" }, "sink": { "name": "Local currency", "type": "Decimal", "culture": "en-us" } },

    Wednesday, April 8, 2020 9:42 PM

All replies

  • Hello Morten_DK and thank you for bringing this to our attention.

    Could you tell me a little more please?

    • Is blob / delimited text    your sink or your source?
    • What is the source / sink type?
    • This is using copy activity, correct?

    I have reproduced the error with Delimited Text type dataset.
    I would like to reproduce your success with the Blob Storage dataset.  I have been unable to so far.  Would you be open to sharing more of your JSON code (Dataset & copy activity)?  If you are not comfortable sharing it here, then you can email me the support files of this pipeline at AzCommunity@microsoft.com.

    If you choose to email me, please include the URL of this thread, and put my username in the subject.

    Wednesday, April 1, 2020 10:00 PM
  • Following up with my findings:

    It is possible to achieve the customer ask.  There is much manual editing of the Factory JSON involved.

    Steps:

    1. Create the initial delimited text dataset
    2. Import the schema into the dataset
    3. Edit the dataset JSON to include the correct “type” and “culture”. (details below)
    4. Save the dataset
    5. Go to the copy activity and select the delimited text dataset you created above.
    6. Go to the copy activity mapping and import schemas
    7. Check the source dataset’s schema was not changed.  Make sure it still has the manual corrections.
    8. Open the pipeline JSON and find the mapping section for the copy activity.  Make the manual changes for both source and sink. (see below)
    9. Do a debug run and check the results in your database.  I found that when I got the culture wrong on sink, the data loaded only as whole numbers.

    If you need to go back and make a correction to the dataset schema, you should follow it by “import schema” in the copy activity mapping.

    Details for step 3:

    In the schema section of the dataset JSON, change the relevant fields like:

    //BEFORE:
    {
        "name": "Local currency",
        "type": "String"
    },
    
    //AFTER:
    {
        "name": "Local currency",
        "type": "Decimal",
        "culture": "da"
    },
    

    Details for step 8:

    Add the appropriate culture to both the source and sink.  Change type if needed.
    An error on the source side culture can produce an 'unable to convert' error.
    An error on the sink side culture can result in the database interpreting the values in unexpected ways.

    //BEFORE:

    { "source": { "name": "Local currency", "type": "Decimal" }, "sink": { "name": "Local currency", "type": "Decimal"

    } },

    //AFTER: { "source": { "name": "Local currency", "type": "Decimal", "culture": "da" }, "sink": { "name": "Local currency", "type": "Decimal", "culture": "en-us" } },

    Wednesday, April 8, 2020 9:42 PM
  • As per our offline communications where you reported this solution works without any issues, I will mark this thread and answered.
    Tuesday, April 21, 2020 5:29 PM