none
Parquet dataset schema RRS feed

  • Question

  • Hi there,

    I am creating adf pipeline which will have parquet as destination dataset in ADLS Gen 2. I want to keep schema as parameter in the pipeline so that we can deploy it for different schema. The following link show how to create the dataset. it does not have the schema.

    https://docs.microsoft.com/en-us/azure/data-factory/format-parquet

    Can you please tell me what a sample schema look like?

    will it be something like this?

    [{"name": "col1","type": "String"},{"name": "col2","type": "String"},{"name": "col3","type": "String"}]

    Another question is: In ADF UI, there is an option in parquet dataset to import schema from sample file. how would that sample file look like and what will be the extension of the sample file? 

    Thanks

    Tuesday, July 16, 2019 1:33 AM

All replies

  • Hi there,

    In Azure Data Factory, the dataset schema has been made read-only. Hence you would be able to read the schema from a file but not manually set it.

    Hence when the parquet dataset is a sink, you need to use a dynamic mapping in order to be able to deploy it for different schema. If you want fresh files to be written in Parquet format in the dataset, you can copy the source schema as well. Refer below screenshot to understand how you can upload a sample file :

    To import schema from sample file, you can upload a file which is in Parquet format and has the schema that you intend to maintain. It can have any extension as long as the schema is what you want it to be.

    Hope this helps.


    Wednesday, July 17, 2019 9:37 AM
    Owner
  • Hi there,

    Just wanted to check - was your problem resolved with the above suggestion?

    Thursday, July 25, 2019 5:07 AM
    Owner