We could see unexpected behaviour of python logging in databricks RRS feed

  • Question

  • We could see unexpected behaviour of python logging in databricks. When we use the mode='a' its adding the log message multiple times and when i use the mode='w', its appending the logs to the file.

    Issues: Modes are not working

    When i execute my piece of code 2 times with mode='a':

    Expected :

    test - this is info -INFO

    test - this is info -INFO

    databricks behaviour:

    test - this is info -INFO

    test - this is info -INFO

    test - this is info -INFO

    When i execute my piece of code 2 times with mode='w':

    Expected :

    test - this is info -INFO

    databricks behaviour:

    test - this is info -INFO

    test - this is info -INFO

    Second issue:

    the file which we have created in the data lake from the databricks and when we delete it in the data lake, when we have check whether file exists in the particular path or not , its saying file exists.

    used code:

    import os


    Monday, June 10, 2019 8:05 AM

All replies

  • Hello Mallaiah somula.  Could you please clarify whether you are using Data Lake Storage Gen1 or Data Lake Storage Gen2 ?

    Also, what version of the Databricks Runtime and which Python version are you using?
    Monday, June 10, 2019 7:05 PM
  • We are using the azure data lake gen2

    Databricks Runtime : 5.3(Includes spark 2.4.0 and scala 2.11)

    Python version : 3

    Tuesday, June 11, 2019 5:25 AM
  • One more issue which we have currently is that, We are not able to execute below lines of code on the databricks cluster.


    I have tried multiplt times and created multiple work spaces in azure portal and multiple cluster and cluster is running file but we are not able to execute.
    we are using same version as mentioned below:
    Python version : 3
    Spark runtime : 5.3

    Wednesday, June 12, 2019 7:31 AM
  • Are you using spark calls to write to the file?  That would explain why I was unable to even open a file with python open("/mnt/myDataLake2/myContainer/myFile.txt","w")
    Wednesday, June 12, 2019 6:47 PM
  • 1. We are writing the code in the databricks notebook and these notebooks are going to call from the ADF.

    2. We are trying to read the file or write the file, not able to by using below line in the notebook cell

    with open('/mnt/myDataLake2/myContainer/myFile.txt','r') as file:


    until we specify the file path with 'dbfs' like  '/dbfs/mnt/myDataLake2/myContainer/myFile.txt'. when we delete the file from the data lake , its still showing file exists when we check it form the below code

    import os


    Please let us know if you need anymore details.

    Thursday, June 13, 2019 6:03 AM
  • Sorry, for the delay.  I just got past my blocker, seems my storage got into a weird state or something.  Maybe permissions.  Anyway, now this is what I got.

    The cluster was Running 42GB, 5.3 (includes Apache Spark 2.4.0, Scala 2.11)

    Saturday, June 15, 2019 2:43 AM
  • Make sure you don't have any blobs mixed in with your ADSLgen2 files.  Previously, it was possible to write to your storage with either 'flavor'.  Does not seem to be as easy to do any more, if possible.  Data Lake methods don't play nice with blobs.

    After some more thought, have you checked permissions on the parent folders?

    Also, it might be worth trying "dbutils.fs.refreshMounts()"

    Saturday, June 15, 2019 2:50 AM
  • We have tried all the possible options. We are still got strucked. Let us know next steps
    Tuesday, June 18, 2019 6:36 AM
  • Sorry to know that the issue is still on going .

    I see that you have multiple issues 

    1. Append & Write issues - We see that the Martin did tried to repro the issue and it works for fine for him . He has also put script which works for him . Can you please share the scripts which you are running ? 
    2. Data frame issues : Is this issue resolved ? Can you please share the error message please .
    3. Files not getting deleted : I think Martin's script does checks this also . 

    Which region is your databricks running on ?

    Thanks Himanshu

    Tuesday, June 18, 2019 8:24 PM
  • The entire code i have used:

    import logging
    from datetime import datetime
    import os

    def custom_logger(path,filename):
      if not os.path.exists(filename1):
        print("creating the directories")
      formater=logging.Formatter('%(asctime)s - %(name)s - %(message)s -%(levelname)s')
      return logger

    Python : when we use the mode 'a', it will append the data and mode 'w' then it will overwrite the file content

    Used Code:



    Databricks: For the file which is existing , when we used mode a' then its updating the data eg : if you execute the same object repetedly, file is updating the same statement in this pattern 1,3,6,12..

    when we use mode 'w' , instead of overwriting its appending the content and its updating the previous logs timestamp to the latest timestamp  

    Python : When we use this statement in python, it should create file in the given path if the file doesn’t exist.

    Used Code:

    import logging


    Databricks : The above code is not working in databricks.i.e., file is not getting created.

    Our Cluster is running in "uksouth"

    Thursday, June 20, 2019 5:12 AM
  • Hi Mallaiah,

    I came across this after trying to use logging in databricks also. It still seems that the logging.FileHandler fails when creating the file or doesn't fail but also doesn't output anything to the file.  Did you maange to get a fully working version where we could call something like logger.INFO('my message')

    Thanks, Chris

    Wednesday, October 16, 2019 1:54 PM