locked
What happens to data values in my python script? RRS feed

  • Question

  • I have one line code which detects null/nan values from dataframe. Something like this: df.isnull().sum()

    In jupyter notebook it works well but when I add it to ML studio's  python script it doesn't work anymore. 

    If I look the data in studio's Visualization there is missing values. In Jupyter notebook the missing values are Nan.

    Jenni

    Wednesday, January 23, 2019 11:01 AM

All replies

  • Hi,

    In Azure Machine Learning Studio, you can clean the missing value by following: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-data

    Regards,

    Yutong

    Wednesday, January 23, 2019 6:07 PM
  • Hi,

    In my case I have to remove nans with some conditions, not all nan values from that column. Sounds wierd but the data is messy.

    I noticed that for some reason ML studio is converting my nan/missing values to minimum int values in the python script module. If I print the values in module like this: print(values), all missing values are printed out as  int32 minvalue: -2147483648.

    I tried to change data type of the column, but only allowed data types were string and in32. If I convert values to string they are '-2147483648' in dataframe Visualization. But if I open the same data in jupyter notebook the nan values are nans. If I convert the data type to string they are still nans in notebook.

    I can fix that problem in my code, but that doesn't tell me what went wrong.

    Jenni


    Thursday, January 24, 2019 7:20 AM
  • Just as a starting point, what version of Python are you using in AML Studio?
    Friday, January 25, 2019 1:08 PM
  • I tested both anaconda 4.0/python 2.7.11 and anaconda 4.0/ python 3.5.
    Monday, January 28, 2019 7:43 AM