none
Is there any way to implement an equivalent of SELECT * INTO dbo.destination from dbo.source in ADF?

    Question

  • Greetings,

    Fairly new to ADF. The task is copy a table from the Blob to ASQL, which needs both the source folder and the destination table. As the blob csv schema has many columns, the procedure to create a corresponding destination table is painfully slow: identify all the fields' formats in the blob and use CREATE TABLE in Azure with the same fields, then linking the source and destination in a pipeline.

    I am wondering if there is a way of automaticly creating a new table in the destination Azure SQL before running the pipeline, an equivalent of SELECT * INTO dbo.destination (in Azure) from source (Blob or other sources).

    Thanks!


    • Edited by polaro Wednesday, October 17, 2018 7:27 PM
    Wednesday, October 17, 2018 6:58 PM

Answers

  • Hi Polaro,

       You can enable the firstRowAsHeader feature in format setting of the blob dataset, 

       check this doc https://docs.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs for detail.

        For auto creating ASQL table, you may try the  stored procedure activity: https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-stored-procedure. Output the field names to the SP activity as parameters of the SP.

    Thanks,

    Li

         

    • Marked as answer by polaro Thursday, November 1, 2018 6:00 PM
    Tuesday, October 23, 2018 3:05 AM

All replies

  • Hi,

        Auto creating table in azure sql is not supported yet.

        You can use getmetadata activity to get blob schema info first, then construct pre-copy script based on the schema info in copy activity. Noting that the pre-copy script will be executed in each pipeline run.

    Thanks.

    Thursday, October 18, 2018 3:14 AM
  • Hi,

    The easy approach would be to create external table on csv file. First upload the file into the blob and then create external table pointing to the blob csv. You can further load the data into the main table from external table with select * into command. You can refer the link below

    https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017



    Cheers,

    • Proposed as answer by Bhushan Gawale Thursday, October 18, 2018 6:19 AM
    Thursday, October 18, 2018 5:30 AM
  • Thanks Li,

    Is there a walthrough document of how to implement something like that? I don't see a "get metadata" functionality in the portal interface, only using some scripts. How do I get the exact schema of the blob's csv? Also, after obtaining the schema, where do I insert it pre-copy as a part of a pipeline?


    Thursday, October 18, 2018 3:03 PM
  • Thanks Bhushan,

    What if the blob's folder contains multiple csv files (partitions)? How do I select * into from them? Will a union work in Azure?

    Thursday, October 18, 2018 3:26 PM
  • Yes, Union should work.

    Another solution is, you can write an event driven Azure function which will automatically trigger as you upload a new file to the Azure blob.


    Cheers,

    Friday, October 19, 2018 2:11 AM
  • Hi Polaro,

         Following are the docs:

         Get metadata activity: https://docs.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity

         Azure sql: https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database

         You can find the pre-copy in the "Azure sql as the sink" section.

         BTW, we highly recommend the visual authoring tool for adf, you can open it via clicking author & monitor in your data factory in azure portal.

    Thanks,

    Li

    Friday, October 19, 2018 2:45 AM
  • Thanks Li,

    I have used the GETMETADATA activity to form the schema script for the csv file. The fields in the output look like below:

    "itemType": "File", "structure": [ { "name": "Prop_0", "type": "String" }, { "name": "Prop_1", "type": "String" }, { "name": "Prop_2", "type": "String" },

    A couple of questions - can I pull actual field names instead of Prop_1 etc? Also, what's the best way to automatically save the output as a ASQL table, for instance? Appreciate your assistance

    Friday, October 19, 2018 6:28 PM
  • Thanks. Looks promising. I'll attempt your proposed solution and get back to you
    Friday, October 19, 2018 6:30 PM
  • Hi Polaro,

       You can enable the firstRowAsHeader feature in format setting of the blob dataset, 

       check this doc https://docs.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs for detail.

        For auto creating ASQL table, you may try the  stored procedure activity: https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-stored-procedure. Output the field names to the SP activity as parameters of the SP.

    Thanks,

    Li

         

    • Marked as answer by polaro Thursday, November 1, 2018 6:00 PM
    Tuesday, October 23, 2018 3:05 AM