locked
Execute Python Script Error (Anaconda 4/Python 3.5) RRS feed

  • Question

  • I got a "MemoryError" running a Python script in a Predictive Experiment.

    What are we supposed to do in these circumstances?

    Thanks

    Here's the error log:-

    Error 0085: The following error occurred during script evaluation, please view the output log for more information:
    ---------- Start of error message from Python interpreter ----------
    Caught exception while executing function: Traceback (most recent call last):
      File "C:\server\invokepy.py", line 192, in batch
        idfs = [parameter for infile in infiles
      File "C:\server\invokepy.py", line 194, in <listcomp>
        infile, is_buffer=False)]
      File "C:\server\XDRReader\xdrutils.py", line 45, in XDRToPyObjects
        attrList = xdrreader.read_attribute_list()
      File "C:\server\XDRReader\xdrreader3.py", line 39, in read_attribute_list
        car = self.read_object()
      File "C:\server\XDRReader\xdrreader3.py", line 140, in read_object
        vectorObject[i] = self.read_object()
      File "C:\server\XDRReader\xdrreader3.py", line 107, in read_object
        values = self.reader.ReadDoubleArray(length)
      File "C:\server\XDRReader\BinaryIO\binaryreader.py", line 33, in ReadDoubleArray
        return self.GetDataArr(self.DoubleArrFmt(length), 8 * length)
      File "C:\server\XDRReader\BinaryIO\binaryreader.py", line 27, in GetDataArr
        return np.array(self.GetDataTuple(struct_obj,size), copy = False)
      File "C:\server\XDRReader\BinaryIO\binaryreader.py", line 21, in GetDataTuple
        return struct_obj.unpack(self.stream.read(size))
    MemoryError
    Process returned with non-zero exit code 1

    ---------- End of error message from Python  interpreter  ----------
    Start time: UTC 01/20/2017 00:37:17
    End time: UTC 01/20/2017 01:11:30

    Here's my Python code:-

    # The script MUST contain a function named azureml_main
    # which is the entry point for this module.

    import pandas as pd
    import numpy as np

    # The entry point function can contain up to two input arguments:
    #   Param<dataframe1>: a pandas.DataFrame
    #   Param<dataframe2>: a pandas.DataFrame
    def azureml_main(df = None, df2 = None):
        
        df['Scored Label Probability'] = df.filter(like='Scored Probabilities for Class').max(axis=1)

        # Return value must be of a sequence of pandas.DataFrame
        return df

    Friday, January 20, 2017 5:29 PM

Answers

  • OK. I have a work-around.

    I've realized that for a Predictive Experiment I already have my trained model, so I don't need to use the full dataset. Cutting my dataset size (using "Split Data") down from 3 million records to 30,000 records was sufficient to prevent the Memory Error occurring.

    Though it does remain a valid question - what do we do when we get a Memory Error? How do we "increase memory" on the Azure resource?

    • Marked as answer by Andrew R Abel Friday, January 20, 2017 7:13 PM
    Friday, January 20, 2017 7:13 PM

All replies

  • OK. I have a work-around.

    I've realized that for a Predictive Experiment I already have my trained model, so I don't need to use the full dataset. Cutting my dataset size (using "Split Data") down from 3 million records to 30,000 records was sufficient to prevent the Memory Error occurring.

    Though it does remain a valid question - what do we do when we get a Memory Error? How do we "increase memory" on the Azure resource?

    • Marked as answer by Andrew R Abel Friday, January 20, 2017 7:13 PM
    Friday, January 20, 2017 7:13 PM
  • you should cut it down to 1 row if you really want to speed it up. we just need the schema.

    unfortunately you cannot increase memory limit. you will have to get creative (such as downsampling etc.)

    Monday, January 23, 2017 4:08 PM