none
how to read csv file with other encoding in azure Databricks RRS feed

  • Question

  • In azure Databricks , I read a CSV file with multiline = 'true' and charset= 'ISO 8859-7'. But I cannot shows some words. It seems that charset option is being ignored. If i use multiline option spark use its default encoding that is UTF-8, but my file is in ISO 8859-7 format. Is it possible that I use the two options at the same time. If it is possible, how to implement it.
    Thursday, October 10, 2019 2:12 AM

All replies

  • Hello,

    Azure Databricks charset: defaults to UTF-8 but can be set to other valid charset names.

    Here is the example of the encoding ISO8859-7 signature “Cash+β’ SignatureΒ�” taken as input file.

    By default: charset is set to UTF-8

    You can change the encoding/charset to “ISO8859-7” as follows: .option("charset", "ISO8859-7") or .option("encoding", "ISO8859-7")

    As I observed, if you use multiline=true and encoding/charset to “ISO8859-7”, which returns the output as default charset UTF-8.

    For more details, refer “Encoding ISO” and “Databricks – CSV Files”.

    Hope this helps.      

    ----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    Thursday, October 10, 2019 11:36 AM
    Moderator
  • Hello,

    Just checking in to see if the above answer helped. If this answers your query, do click “Mark as Answer” and Up-Vote for the same. And, if you have any further query do let us know.

    Friday, October 11, 2019 8:53 AM
    Moderator
  • Hello,

    Thanks for your reply.

    I know if I use multiline=true and encoding/charset to “ISO8859-7”, which returns the output as default charset UTF-8. Now I want to know how to use the multiline=true and encoding/charset to “ISO8859-7” at the same time. This is my issue. Please help me


    Monday, October 14, 2019 6:17 AM
  • Hello,

    Unfortunately, you cannot use "multiline" and "charset" together, if you use together encoding will be set to default.

    For more details, refer “Spark – Known issue”.

    Hope this helps.      

    ----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    Monday, October 14, 2019 6:41 AM
    Moderator
  • Hello,

    Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.

    Tuesday, November 5, 2019 6:29 AM
    Moderator