locked
Azure data factory - CSV Infer Schema is shifting column values in mapping data flows which results in wrong data RRS feed

  • Question

  • Hi,

    i am trying to use the infer schema option in azure data factory mapping data flows with two different csv files located in Azure Data Lake Store Gen2 in a test folder.

    Data_1.csv ( includes 4 columns)

    Data_2.csv (3 columns with no Phone Column)

    i have created a dataset named "CSV_TEST" pointing to the "test" folder in ADLS Gen2 and used the preview option to see the data exists in the folder and here is how it looks ( data looks perfect)

    Once i created the dataset i have added the new mapping data flows and pointed to the dataset "CSV_TEST" which i created above and enabled "Allow Schema Drift" and "Infer Drifted Column Types" in the settins as shown below

    Clicked on the Data Preview option in the source and here is how the data looks ( data from the email column shifted under Phone column) and i assume this seems to be a bug as it is not behaving the way i previewed in the dataset.

    Does any have faced the similar issue in mapping data flows? 

    Regards

    Lokesh Reddy



    Tuesday, March 10, 2020 11:24 PM

All replies

  • Hi clokeshreddy,

    Thanks for reporting this to us. We have reported this issue to internal engineering team and will get back to you as soon as we have a response from the team. 

    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Wednesday, March 11, 2020 2:26 PM
  • Hi 

    Update:  I have verified with Internal team and they have confirmed that this behavior is by design as of now in mapping data flow while using "Allow schema drift", "Infer drifted column types" is enabled and all data files should be of same schema (same number of columns). Since your two data files having different columns in each file and that’s the reason you are seeing this.

    A possible workaround  for this is to go through multiple sources, then union or join and create your preferred target schema in the flow with select and derived column.


    If you have any feedback/suggestion regarding this feature, I would recommend you to please share your thoughts/ideas in ADF user voice forum: https://feedback.azure.com/forums/270578-azure-data-factory
    All the feedback shared in user voice are actively monitored and reviewed by ADF engineering team and will take necessary action. 

    Hope this helps. Let us know if you have any further query.  


    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Friday, March 13, 2020 7:59 PM
  • Hi 

    Just checking to see if the above information was helpful? If you have any further query, do let us know.


    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Monday, March 16, 2020 5:39 PM