none
Ho to check file exists in ADLS from databrick before load

    Question

  • How to check file exists in ADLS in databricks (scala) before loading 

    var yltPaths: Array[String] = new Array[String](layerCount)

      for(i <- 0 to (layerCount-1))
        {
              layerKey =layerArr(i).getInt(0)          
              yltPaths(i) = s"""adl://xxxxxxxxxxxxxxxxxxxxxxxxx/testdata/loss/13/2/dylt/loss_$layerKey.parquet"""

    }

    var fexs = yltPaths.filter(p=> {<check file exists >})

    var ylt = spark.read.parquet(fexs:_*)

    Wednesday, July 25, 2018 7:21 PM

All replies

  • please give this method a try. This should work. If you face any error please let me know with the code which you tried
    Wednesday, July 25, 2018 9:22 PM
    Moderator
  • No that doesn't work unless you've mounted the storage into dbfs - which IS NOT a great idea if you care about security. All clusters will be able to bypass security and access the lake. Session scoped data lake connections will not be available in hadoop configurations used in the above code. I need to do something similar... I'm thinking an API call to datalake or just place a try catch around a DBUtils list.

    EDIT: For session scoped data lake mounts I'm just going to do this in a function and use a try catch.

    val files = dbutils.fs.ls("adl://MYPATH")

    Then catch this exception and return false.

    java.io.FileNotFoundException: File/Folder does not exist:


    shaun



    Tuesday, April 16, 2019 11:52 AM
  • This will work for directories... Files is a little more complicated because you have to map the filename to a list and check that but will post something more complete when I get to it:

    def CheckPathExists(path:String): Boolean = 
    {
      try
      {
        dbutils.fs.ls(path)
        return true
      }
      catch
      {
        case ioe:java.io.FileNotFoundException => return false
      }
    }


    shaun

    Tuesday, April 16, 2019 12:05 PM
  • Hi Shaun,

    Thanks for sharing the solution, which might be beneficial to other community members reading this thread. 

    Wednesday, April 17, 2019 4:48 AM
    Moderator