none
Source Dataset Preview shows Strange Font

    Question

  • I setup a source dataset and when I go to the connection tab there is button for Preview data. I select this and on the preview, the number of columns is correct, but the data is some strange font. The connection is to a csv file in the blob. When I view preview in my storage explorer, the csv populates fine. Is there something I need to do here to view this correctly?

    On the connection tab, There are options, but I have not changed any of them. So the file format is "text format". I tried the others from the dropdown, but they all throw an error.  The test connection was successful and as I said, the data is recognized, just in a weird font. Any help on how I can get this to display properly.

    Thursday, July 19, 2018 9:45 PM

Answers

  • Hi JoshJJames,

    I got your question, the csv file encoding with Unicode little Endian is sent from another tool, now you preview this file in ADFv2 and the weird font showed up.

    The root cause is the ADFv2 use UTF-8 encoding as default when previewing file. To solve it, you need to specify the encoding name in your dataset definition. You could select one from the following options:

    System_CAPS_pubpropertySystem_CAPS_static Unicode

    Gets an encoding for the UTF-16 format using the little endian byte order.

    System_CAPS_pubpropertySystem_CAPS_static UTF32

    Gets an encoding for the UTF-32 format using the little endian byte order.

    Thanks.

    • Marked as answer by JoshJJames Saturday, July 21, 2018 4:51 PM
    Saturday, July 21, 2018 12:45 AM

All replies

  • Hi JoshJJames,

    Is your file type the 'CSV(Comma delimited, *.csv)'? If it is, the issue shouldn't show up.

    Also, I noticed that when you click Preview, you didn't select the csv file, which means you're previewing the whole folder. Please specify the file and try again.

    If problem still exists, would you please provide a row of sample data in your csv file? It's pretty weird, we'll investigate.

    Thanks.
    Friday, July 20, 2018 1:43 AM
  • The file is a csv. I had three files in that blob container, so I went ahead and added the individual file name. Then, I clicked the box to make first row a header row. So, now the header row is populating the names, but it still has the square font everywhere. I opened the file in notepad and it looks like csv. 

    I noticed that when I download the file and then open and save it in excel, it still says csv, but the file size is less than the original. I don't know why or if that matters. I moved the original file and the one that I opened and saved to Onedrive. Even looking at the file icon preview in onedrive shows one having data and one does not (they both have the same data)

    File copied to Onedrive so you can see. The first one is the original download, the second one is the saved in excel version. 

    https://1drv.ms/u/s!ArBesiFzZBSDigvG9y7OIbfDKx90

    https://1drv.ms/u/s!ArBesiFzZBSDigoCjoweNF0ijl3v

    Friday, July 20, 2018 2:48 PM
  • After doing some more digging on the file, the file type is csv, but the encoding on the file is UCS2-Little Endian. When I saved the file in excel it converts it to UTF-8. So apparently there is an issue with reading the encoding on the file. I am not sure if Microsoft has documentation as far as encoding on files. 

    This file is sent straight from another tool. Is there a way in Azure to change the encoding?

    Friday, July 20, 2018 4:42 PM
  • Hi JoshJJames,

    I got your question, the csv file encoding with Unicode little Endian is sent from another tool, now you preview this file in ADFv2 and the weird font showed up.

    The root cause is the ADFv2 use UTF-8 encoding as default when previewing file. To solve it, you need to specify the encoding name in your dataset definition. You could select one from the following options:

    System_CAPS_pubpropertySystem_CAPS_static Unicode

    Gets an encoding for the UTF-16 format using the little endian byte order.

    System_CAPS_pubpropertySystem_CAPS_static UTF32

    Gets an encoding for the UTF-32 format using the little endian byte order.

    Thanks.

    • Marked as answer by JoshJJames Saturday, July 21, 2018 4:51 PM
    Saturday, July 21, 2018 12:45 AM
  • In data factory, I went to the dataset=> advanced and typed in "unicode" under encoding name. Worked like a charm. thanks so much for your help @wangzhang !

    josh

    Saturday, July 21, 2018 4:53 PM
  • Glad to hear.

    Thanks.

    Monday, July 23, 2018 5:15 AM