none
Help Needed: Error 0085 in Azure Machine Learning Studio Process returned with non-zero exit code 1 RRS feed

  • Question

  • Hi,

    Im trying to use Azure Machine Learning Studio and Python to process text strings into ngrams and bigrams, the code works perfectly in other environments such as Jupyter notebooks and Komodo however when I run the code in AMLS I get an error 0085: 

    TypeError: 'numpy.int64' object is not iterable
    Process returned with non-zero exit code 1

    I have the module using Python 3.5 and the input data is only a small sample to test the module so I cannot figure out where this error is coming from.

    I've seen a lot of compatibility issues with AMLS in forums so presuming thats it. 

    Could someone help me try and figure this out please?

    import pandas as pd
    import numpy as np
    import csv
    import nltk
    from nltk import sent_tokenize
    from nltk.tokenize import word_tokenize
    from nltk.corpus import stopwords
    
    # The entry point function can contain up to two input arguments:
    #   Param<dataframe1>: a pandas.DataFrame
    #   Param<dataframe2>: a pandas.DataFrame
    def azureml_main(dataframe1 = None, dataframe2 = None):
    
        # Execution logic goes here
        print('Input pandas.DataFrame #1:\r\n\r\n{0}'.format(dataframe1))
    
        stop_words1 = stopwords.words('english')
        stop_words1.sort()
        colToTokenise = "TextString"
        listOfColumnsdf = [['Metric','sum']]
        dataframe1 = dataframe1[['TextString','Metric']]
        sortBy = "Metric"
    
        columnsdf = pd.DataFrame(listOfColumnsdf,columns=['Metric','Aggregation'])
    
        for index, coln in enumerate(columnsdf['Metric']):
            dataframe1[coln] = dataframe1[coln].apply(pd.to_numeric,errors='ignore')
        
        aggregations = {}
        for index, col in columnsdf.iterrows():
            aggregations[col[0]] = col[1]
        
        dataframe1 = dataframe1.groupby(colToTokenise).agg(aggregations)
        dataframe1 = dataframe1.reset_index()
        for coln in dataframe1: 
            coln = coln.strip()
            
        words = [word_tokenize(dataframe1[colToTokenise][i]) for i in range(len(dataframe1[colToTokenise]))]
        everygrams = [list(nltk.everygrams(words[i],1,3)) for i in range(len(words))]
    
        listOfColumnNamesRaw = list(columnsdf['Metric'])
        listOfColumnNamesNew = listOfColumnNamesRaw
        listOfColumnNamesNew[0:0] = ["Token"]
    
        metricsByToken = pd.DataFrame(list(zip(everygrams,*[dataframe1[cols] for cols in columnsdf['Metric']])),columns=listOfColumnNamesNew)
        
        metricsByToken = metricsByToken.sort_values(by=[sortBy], ascending=False)
        metricsByToken = metricsByToken.reset_index(drop=True)
        metricsByToken2 = metricsByToken[:]
        
        TokensTable = metricsByToken2.apply(lambda x: pd.Series(x['Token']),axis=1).stack().reset_index(level=1, drop=True)
        TokensTable.name = 'Token'
        TokensTable = metricsByToken2.drop('Token', axis=1).join(TokensTable)
        TokensTable = TokensTable.reset_index(drop=True)
        
        TokensTable = TokensTable.groupby('Token').agg(aggregations)
        TokensTable = TokensTable.sort_values(by=[sortBy], ascending=False)
        TokensTable = TokensTable.reset_index()
        
        for row in TokensTable.index:
            for word in row:
                if word.isalpha():                                           
                    word = word.lower()                                       
                    if word in stop_words1:                                   
                        TokensTable.drop(row,inplace=True, errors='ignore')
                else:
                    TokensTable.drop(row,inplace=True, errors='ignore')
         
        TokensTable = TokensTable.reset_index()
        return TokensTable,



    • Edited by Murtagh8 Monday, July 29, 2019 10:45 AM
    Monday, July 29, 2019 10:05 AM

All replies

  • Hi Murtagh,

    Could you please share your input data sample to us if it is not confidential? It should help us to troubleshooting it.

    Regards,

    Yutong

    Tuesday, July 30, 2019 10:03 PM
    Moderator
  • Hi Murtagh,

    Do you have any update for this thread? Hope you have solved your issue.

    Regards,

    Yutong

    Friday, August 2, 2019 3:10 PM
    Moderator
  • Hi Yutong,

    Thanks for the reply and sorry for the delayed response!

    I cant share the data unfortunately but I preprocess my data before pushing it to the python module.

    So there is only ever two input columns, in the example above they are named TextString and Metric. TextString would contain purely String values with no unusual symbols or numbers and Metric would contain purely whole numbers. 

    Monday, August 12, 2019 8:30 AM