locked
How to convert SourceDataset to pandas dataframe? RRS feed

  • Question

  • Hi,

    I did some data processing in Azure ML Studio and saved the resulting dataset.

    Now I'm trying to load that dataset in a jupyter notebook (to explore the data).

    I use the following code :

    ds = ws.datasets['Dress clusters']
    print(type(ds))
    ds.to_dataframe()

    but get the following error:

    AttributeError: 'SourceDataset' object has no attribute 'to_dataframe' Any idea how to solve that?

    Thanks

    Lucas


    • Edited by lugro Tuesday, November 27, 2018 12:46 PM formatting
    Tuesday, November 27, 2018 12:45 PM

All replies

  • Hi Try this

    ds
    = ws.datasets['Dress cluster']
    frame
    = ds.to_dataframe()
    type(frame)
    Tuesday, November 27, 2018 3:29 PM
  • Hi Lee, thanks for the rescue.

    Doesn't work. I tried the following:

    ds = ws.datasets['Dress clusters']
    print(type(ds))
    frame = ds.to_dataframe()
    print(type(frame))
    The first print returns
    <class 'azureml.SourceDataset'>

    But the second fails:

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-56-be1822c93492> in <module>()
          7 ds = ws.datasets['Dress clusters']
          8 print(type(ds))
    ----> 9 frame = ds.to_dataframe()
         10 print(type(frame))
    
    AttributeError: 'SourceDataset' object has no attribute 'to_dataframe'


    • Edited by lugro Tuesday, November 27, 2018 3:55 PM formatting
    Tuesday, November 27, 2018 3:55 PM
  • Hi,  As your using Notebooks in Azure ML you need to connect to your storage account first

    If you select the dataset with Azure ML within your experiment simply right click and select open in notebooks (choose your python version)

    A new notebook will be created and added to the notebook section what this does is add a data connector to your Azure Storage account so you can access the data 

    In the notebook the first code cell will have the connection string 
    example

    from azureml import Workspace
    ws = Workspace()
    ds = ws.datasets['MetaAnalytics.Test.GlobalDataset.IntegerCSVFile']
    frame = ds.to_dataframe()


    the next cell with simply display the dateframe 

    frame

    Sorry this was my misreading of the problem. See the following walkthrough
    https://blogs.msdn.microsoft.com/uk_faculty_connection/2016/05/05/jupyter-notebooks-in-azure-machine-learning-studio-the-perfect-tool-for-academics-and-students/ 



    Tuesday, November 27, 2018 6:59 PM