locked
Connecting from DataWareHouse to Storage Account Blob RRS feed

  • Question

  • Dear Team,

    We are trying to run scala command in Databricks, which will connects to storage for taking data and making dataframe. After that we. need to feed the dataframe in sql datawarehouse

    df.write.format("com.databricks.spark.sqldw").option("url", sqlDwUrlSmall).option("dbtable", "<schema.table>").option("forward_spark_azure_storage_credentials","True").option("tempdir", "wasbs://<container>@<storage account>.blob.core.windows.net/tempDirs").mode("overwrite").save() 

    At that time we facing some issue.

    Underlying SQLException(s):
    - com.microsoft.sqlserver.jdbc.SQLServerException: External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
    HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: StorageException: This request is not authorized to perform this operation.' [ErrorCode = 105019] [SQLState = S0001] 

    Could you please tell us know what is going wrong and how to fix it?

    Regards,

    Satyajit


    Wednesday, February 12, 2020 11:49 AM

Answers

  • Thanks Mike for replying.

    It does not work in that way also.

    I have tried this and it got worked

    val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"

    // Create a Properties() object to hold the parameters.
    import java.util.Properties
    val connectionProperties = new Properties()

    connectionProperties.put("user", s"${jdbcUsername}")
    connectionProperties.put("password", s"${jdbcPassword}")

    val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    connectionProperties.setProperty("Driver", driverClass)
    import org.apache.spark.sql.SaveMode


    df.write.mode(SaveMode.Append).jdbc(jdbcUrl,"<schema.tablename>", connectionProperties)

    Checking the issue what i did wrong in previous code.

    Reagrds,

    Satyajit

    Thursday, February 13, 2020 11:34 AM

All replies

  • Hi Satyajit,

    I think there is an issue with the wasbs url. Is there a specific directory named tempDirs in the referenced storage account? If the path exists, there there is likely a permissions issue.

    HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: StorageException: This request is not authorized to perform this operation.' [ErrorCode = 105019] [SQLState = S0001] 

    In looking at the following documentation: Access Azure Blob storage using the DataFrame API

    val df = spark.read.parquet("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>")
    
    dbutils.fs.ls("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>")

    What is the container-name value and the directory-name value, and if you converted the url to https://, could you navigate to the location of your source files using the following format: 

    https://<storage-account-name>.blob.core.windows.net/<container-name>/<directory-name>/file

    The correct format might be:

    wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/

    Azure Storage does not have directories but instead, has a concept called containers. The Databricks documentation for Azure SQL Data Warehouse references <directory-name> but this is not necessarily correct. You cannot create nested containers or have directories within a container so, the <directory-name> value is likely causing the issue.

    Please let me know if you are still having issues, or if you found the solution, please share the steps taken to fix the issue.

    Regards,

    Mike


    Wednesday, February 12, 2020 7:53 PM
  • Thanks Mike for replying.

    It does not work in that way also.

    I have tried this and it got worked

    val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"

    // Create a Properties() object to hold the parameters.
    import java.util.Properties
    val connectionProperties = new Properties()

    connectionProperties.put("user", s"${jdbcUsername}")
    connectionProperties.put("password", s"${jdbcPassword}")

    val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    connectionProperties.setProperty("Driver", driverClass)
    import org.apache.spark.sql.SaveMode


    df.write.mode(SaveMode.Append).jdbc(jdbcUrl,"<schema.tablename>", connectionProperties)

    Checking the issue what i did wrong in previous code.

    Reagrds,

    Satyajit

    Thursday, February 13, 2020 11:34 AM
  • Thank you for providing the resolution to your issue. I am marking it as the answer, as it is very helpful to others who are experiencing the same issue. And, as always, please let us know if you have additional questions and we can continue the dialoge.

    Regards,

    Mike

    Thursday, February 13, 2020 5:08 PM
  • Sure.. Thanks
    Friday, February 14, 2020 11:10 AM