none
During an Copy Activity, how to deal with null value?

    Question

  • I have an Azure Data Factory V2 pipeline which defines data imports from a TSV file in Date Lake Gen1 to an Azure SQL Server Database table.

    Some columns of the <g class="gr_ gr_18 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" data-gr-id="18" id="18">TSV</g> file may contain no value in it.

    But the correspondence columns in the SQL server table have the Not Null constraint on it.

    I want to convert the null value to 0 automatically during the copy activity.

    What should I do to achieve this goal?

    Or if there is any method that I can deal with the null value.

    Thanks.


    • Edited by QingGINQ Thursday, July 19, 2018 3:25 AM
    Thursday, July 19, 2018 3:24 AM

All replies

  • Hi QingGINQ,

    You could use an U-SQL activity to firstly transform data in TSV file and then use Copy activity to move data into Azure SQL database.

    Thanks.
    Thursday, July 19, 2018 4:49 AM
  • Hi Wang

    Thanks for your reply.

    And I have one more question about the copy activity.

    It means that there is no way to manipulate the data during the copy activity.

    If I want a slightly change I need to preprocessing the data, is it right?

    Thursday, July 19, 2018 6:25 AM
  • Hi QingGINQ,

    If you don't want to preprocess data, copy activity provides sqlWriterStoredProcedure in SQL database sink, you can define how to apply data in TSV file into your target table in the store procedure. For example, you could replace the NULL value with 0 in the store procedure, which depends on your business logic.




    • Edited by Wang Zhang Thursday, July 19, 2018 7:18 AM
    Thursday, July 19, 2018 6:42 AM
  • Hi Wang,

    Appreciate for your replies.

    About Option 1

    I have tried the 'NULL Value', setting its value to 0 but nothing happened, the null value is still null in the SQL table, did not replace by 0.

    Maybe I did something wrong, I will try this method later.

    About Option 2

    I have no idea how to pass the SP processed data to the sink dataset.

    I have read these two tutorials.

    https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/data-factory/v1/data-factory-invoke-stored-procedure-from-copy-activity.md

    https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/data-factory/transform-data-using-stored-procedure.md

    But I still have no idea how to make it work.

    Would you mind talking about more details of this method?

    Many thanks to you.

    Updates:

    Oh, if I use the SP during the copy activity, the copy activity will not insert the data automatically, the SP will do the insert, is it right?

    In the SP, I can control the insert process and the copy activity will not do the insert anymore?

    • Edited by QingGINQ Thursday, July 19, 2018 7:27 AM
    Thursday, July 19, 2018 7:21 AM
  • Hi,

    Just forget about Option 1, I made a mistake, Option 1 doesn't fit for your scenario. :)

    For Option 2, the store procedure handles the input data from source file and copy into sink table based on your logic in it. It doesn't mean copy activity will not do the insert, you use store procedure when built-in copy mechanisms don't serve your purpose.

    This example gives some details you'll need, fell free to click.

    Thursday, July 19, 2018 7:39 AM
  • Hi Wang,

    Thank you very very much.

    I will read the example later.

    I just post another question about seconds ago, but it identified as spam, and I can't find the post anywhere.

    So I have to ask here.

    My question is:

    How to set the CultureInfo of the DateTime type input of the source dataset

    my TSV file has one column contain time string, like:"7/18/2018 8:14:27 AM"

    And the schema of the column of the source dataset is:

              {
                    "name": "QueryTime",
                    "type": "DateTime",
                    "format": "M/dd/yyyy h:mm:ss tt"
                },

    when the pipeline is running ,I got this error:

    {
        "errorCode": "2200",
        "message": "ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column &apos;CreateTime&apos; contains an invalid value &apos;7/18/2018 8:14:27 AM&apos;. Cannot convert &apos;7/18/2018 8:14:27 AM&apos; to type &apos;DateTime&apos; with format &apos;M/dd/yyyy h:mm:ss tt&apos;.,Source=Microsoft.DataTransfer.Common,''Type=System.FormatException,Message=String was not recognized as a valid DateTime.,Source=mscorlib,'",
        "failureType": "UserError",
        "target": "Copy_CSSaiAnalysis_h0"
    }

    it points out that my time format is invalid.

    But the C# code:

    DateTime.ParseExact("7/18/2018 8:14:27 AM", "M/dd/yyyy h:mm:ss tt", CultureInfo.InvariantCulture);

    works well.

    The only difference is the CultureInfo, and I don't know where to set it in Copy Activity.

    Or is there any method to deal with the DateTime format?

    Thanks.

    Thursday, July 19, 2018 8:13 AM
  • And It still confused me about the "It doesn't mean copy activity will not do the insert",

    you are implying the copy activity will insert the data too?

    So I will have two almost identical row in my sink dataset.

    One was inserted by the SP, and the other inserted by the copy activity?

    Thursday, July 19, 2018 8:21 AM
  • Hi Qing,

    Where do you set the CultureInfo? Use @formatDateTime('your CultureInfo','M/dd/yyyy h:mm:ss tt') to format CultureInfo.

    Thursday, July 19, 2018 8:34 AM
  • No, using store procedure at sink side is the supplementary of the built-in copy activity, it just appends some complex logic to copy activity, eg. filter NULL value when copy. You data will only be copied once.

    I would suggest you read the example I posted before, you'll get a clearer understanding on using store procedure in copy activity.

    Thursday, July 19, 2018 8:46 AM
  • Hi Wang,

    I found the formatDateTime function in the link below:

    https://docs.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions

    but I think the tutorial doesn't talk much about where/how to use these function.

    @formatDateTime('datetime string','M/dd/yyyy h:mm:ss tt')

    where should I place this expression to process the column with datetime string of my source dataset?

    And the JSON of the column of the source dataset is:

              {
                    "name": "QueryTime",
                    "type": "DateTime",
                    "format": "M/dd/yyyy h:mm:ss tt"
                },

    I can't figure out where to put the formateDateTime expression to the JSON.

    I am a very beginner of the Azure Data Factory. so many questions, and so many thanks for your kindly help.

    Thursday, July 19, 2018 9:03 AM
  • For culture and format, please reference this doc

    Actually, you format looks fine to me. You could also set the culture.

    And Could you share a sample data? Has it been quoted? If it is quoted, you could try to set the right quoteChar.

    Also, what is your sink data type? If it is a string, you could just change the source data type to string as well.


    Thursday, July 19, 2018 11:31 AM
  • Hi QingGINQ,

    Just notice that you define format in column JSON, so you don't need to use formatDataTime any more.

    We're glad to help you on this case. Would you please send me (email: wangzhan@microsoft.com) your copy activity JSON code, I need more context on your scenario. We can talk about it offline and then take the final solution to the forum. 

    Thanks.

    Thursday, July 19, 2018 11:34 AM
  • After an offline talk, glad to say QingGINQ finally used Stored Procedure to filter null value and format DateTime.

    Thanks.

    Monday, July 23, 2018 5:26 AM