none
Copy Activity - Dynamic schema mapping RRS feed

  • Question

  • Hello,

              I am trying to copy data from a csv file stored on blob storage to an azure sql table. The file has 6 column ant the table 10 columns. I need to use dynamic schema mapping to copy only 2 columns from the file to the table, so I can do it for multiple files. I am using the following dynamic json for copy activity schema mapping, that is coming from a parameter:

    "translator": { "type": "TabularTranslator", "columnMappings": { "NAME": "NAME" }

    This is the copy activity code in debug:{

    "source": { "type": "DelimitedTextSource", "storeSettings": { "type": "AzureBlobStorageReadSettings", "recursive": false, "enablePartitionDiscovery": false }, "formatSettings": { "type": "DelimitedTextReadSettings" } }, "sink": { "type": "AzureSqlSink" }, "enableStaging": false, "translator": { "type": "TabularTranslator", "columnMappings": { "NAME": "NAME", "CODE": "CODE"

    } } }

     Unfortunately I am getting an error:

    "errorCode": "2200", "message": "ErrorCode=UserErrorStructureIsRequired,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The structure/schema setting in source is required for column mapping.,Source=Microsoft.DataTransfer.ClientLibrary,'", "failureType": "UserError"

    What am I doing wrong?


    Monday, September 16, 2019 1:29 PM

All replies

  • Hi Cristian,

    An ideal way to do so would be to import schemas in the mapping tab of the copy activity. Once that's done, you can de-select the columns you don't want to be copied. 

    You can also map the columns in the csv file to corresponding columns in the azure sql table.

    Here's a screenshot for your reference :

    Hope this helps.

    Tuesday, September 17, 2019 7:12 AM
    Owner
  •    Hello Mishra,

           Unfortunately, I cannot do that, because source and sink are dynamic and I don't know the name of the columns. I have many files in a folder and from each file I need to copy 2 fields in different tables, so source and sink are being filled by a lookup that gets the data from a configuration table. So, I want have only one pipeline ( a generic one ), not one pipeline per file.

    Thank you

    Tuesday, September 17, 2019 3:05 PM
  • Hi Cristian,

    In that case, did you have a look at Data Flow feature of Azure Data Factory ? It might be useful in your use case.

    Let us know if it helps. Else, we can gladly continue the dialogue.

    Wednesday, September 18, 2019 9:01 AM
    Owner
  • Hello,

       Unfortunately, we cannot use the data flow feature, as is not supporting self hosted integration runtime. So, does copy activity support this scenario?

    Thank you

    Wednesday, September 18, 2019 9:26 AM
  • Hi Cristian,

    In that case, you would have to go for schema mapping to include only the columns that you want.

    Thursday, September 19, 2019 8:46 AM
    Owner
  • Hi Cristian,

    Just wanted to check - was the above suggestion helpful to you? If yes, please consider upvoting and/or marking it as answer. This would help other community members reading this thread.
    Thursday, October 3, 2019 11:15 AM
    Owner
  • Hi Cristian,

    I haven't heard back from you in quite some time. Was the above suggestion helpful to you? If yes, please consider upvoting and/or marking it as answer. This would help other community members reading this thread.

    Wednesday, October 9, 2019 10:25 AM
    Owner
  • Hi Chirag can you elaborate on this.  Since the source and sink are both dynamic, using Import schemas will not work so how do " schema mapping to include only the columns that you want".

    Best Regards

    Asad


    Asad Khan

    Tuesday, November 5, 2019 7:51 PM
  • Hi Asad,

    If the source and sink, both are dynamic (you are not specifying the file names/table names in the connection), you won't be able to import schemas.

    If however you have a schema for the source and sink that's fixed, you can do the mapping to include columns you want, as shown in the screenshot I shared. 

    If I had a fixed schema, even with dynamic source and sink, I would make the schema mapping first (by importing schemas by specifying a table name or a file name or a sample file) and then make the connection dynamic.

    Wednesday, November 6, 2019 6:55 AM
    Owner
  • Thanks Chirag.   So there is  no way to dynamically build the schema if the source/sink are not fixed schema as it would change from source to source?

    Also if I used your option two, to first map the schema with a define source and sink, then change to dynamic connection, if I have extra fields in the sink which depend on values from other activities of the pipeline, I should be able to reference those in the mapping even though they are not coming from Source?

    Best Regards 


    Asad Khan

    Thursday, November 7, 2019 2:08 PM
  • Hi Asad,

    That can be done using a custom activity or an Azure Function Activity.

    Yes you should be able to reference the mapping but it might throw an error. Hence, you might want to ignore that column (which is not coming from the source).

    Friday, November 8, 2019 11:16 AM
    Owner
  • Hi Chirag:

    the issue is not ignoring source columns, I am using all source columns.  I have two additional sink columns that I need to populate based on output values from other activities, example Client ID for that source load.  But i dont get the ClientID column from source columns.  It is derived through other activities and I want to add this in as I will need to know which data belongs to which client as the files are being processed parallel and load to the same staging table.  

    Basically looking to add a derived columns in the the pipeline that can be used in the copy activity.  I am trying to avoid using DataFlows due to the high cost.   


    Asad Khan

    Friday, November 8, 2019 2:24 PM
  • Hi Chirag,

    If source and sink is dynamic and schema is not fixed. How can I add 2 additional column for audit purpose (for example hardcoded value "source_name" and @utcnow()) during the copy activity from Source Flat File (Gen2) to Target SQL Table (SQLDb)?


    • Edited by azuser1 Friday, November 8, 2019 4:09 PM
    Friday, November 8, 2019 3:28 PM
  • Hi Asad and Syed,

    If you want to add a column, you can do so by first importing schema from a table (which you can make dynamic later), and then hit new column in the dataset as shown below :

    Hope this helps.

    Monday, November 11, 2019 4:57 AM
    Owner
  • Hi Chirag:

    You are not understanding the issue.  The source and sink are dynamic.

    The source has as an example fields as:

    col1|col2|col3

    The sink has fields:

    col1|col2|col3|col4|col5

    when I initialize the schema it will map to the first three columns, but Col4 and Col5 do not come from Source they come from other activities within pipeline as parameter or an output from Lookup, the issue is how to get those fields added during Copy Activity.

    Thanks


    Asad Khan

    Tuesday, November 12, 2019 5:35 PM
  • Hi Asad,

    Thank you for clarifying. In that case, you can consider the following options :

    Hope this helps.

    Wednesday, November 13, 2019 6:14 AM
    Owner
  • Thanks Chirag.

    I am holding off on using Data Flow for now as it is too much cost for the startup level I am at.

    But I have identified a solution by using U-SQL Activity where I use the source and add the 2 new columns I need with parameters into the U-SQL activity and it is used the Script to include as part of the output.    

    So we can close this thread, but I will be opening a new ticket for the u-ssql activity as I am having slight issue with parameter passing.


    Asad Khan

    Wednesday, November 13, 2019 3:59 PM
  • Glad to know that your issue was resolved Asad. Thanks for sharing your findings. You can also run a stored procedure to do so.
    Thursday, November 14, 2019 4:25 AM
    Owner