locked
Azure ML - Execute R Script - Values in output data set appear to be an aggregate and are the same for each row RRS feed

  • Question

  • I have an R script that is scoring the text based on a list of key words. When I run the script in R studio it works perfectly returning the score for each text on a set of categories. When I integrate this into Azure ML Studio it doesn't error but the behaviour is strange and cannot fathom why this is occurring. I’ve followed the guide lines from MS when adding R scripts and if I pass a single row from a data set the score is returned OK. However if I pass more than 1 row then it will total the score for every row for each category and pass the same aggregate value for each row. 
    Please see below the parent R which appears in the Azure ML Studio and beneath this is the RFunctions.R held in a .zip file and called as a dataset.

    The WordScore_List is an input from the data set available in the following location, created by Finn Årup Nielsen in 2009-2011
    http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010

    The file I used was AFINN-111.txt
    I added column names Word and Score

    Please let me know if I need to clarify anything else. Any help would be greatly appreciated. Thank you John Shiangoli

    #****************************************************************************************************************
    #Below is the R script as it appears in ML Studio Execute R Script code window
    #****************************************************************************************************************

    # Map 1-based optional input ports to variables
    dataset1 <- maml.mapInputPort(1) # class: data.frame
    # get the text columns from the input data set
    text_column <- dataset1[["text_column"]]

    WordScore_List <- NULL
    result <- tryCatch({
      dataset2 <- maml.mapInputPort(2) # class: data.frame
    #get the WordScore_List list from the second input data set
      WordScore_List <- dataset2
    }, warning = function(war) {
    # warning handler 
      print(paste("WARNING: ", war))
    }, error = function(err) {
    #error handler
      print(paste("ERROR: ", err))
      WordScore_List <- NULL
    }, finally = {})

    # Load the R script from the Zip port in ./src/
    source("src/RFunctions.R");
    # Call the function in the above to return the text score
    Vector_TextScore <- SentimentScore(text = text_column, 
                                  vectorTextScore_list = WordScore_List)

    vPosMatches <- NULL
    posMatches <- NULL
    vNegMatches <- NULL
    negMatches <- NULL  

    vPosMatches <- Vector_TextScore[1]
    posMatches <- Vector_TextScore[2]
    vNegMatches <- Vector_TextScore[3]
    negMatches <- Vector_TextScore[4]                  


    data.set <- data.frame(
    #  label_column,
      text_column,
      vPosMatches,
      posMatches,
      vNegMatches,
      negMatches,
      stringsAsFactors = FALSE 
    )    

    # Select data.frame to be sent to the output Dataset port
    maml.mapOutputPort("data.set")


    #****************************************************************************************************************
    #Below is the RFunctions.R which is saved in a .zip and added as a data set then used in the 3rd input
    #****************************************************************************************************************
    SentimentScore <- function(text, 
                               vectorTextScore_list = NULL)
    {
    library(plyr)
    library(stringr)


    vectorTextScore_list$Word <- tolower(vectorTextScore_list$Word)

    #categorize words as very negative to very positive 
    vNegTerms <- vectorTextScore_list$Word[vectorTextScore_list$Score==-5 | vectorTextScore_list$Score==-4]
    negTerms <- vectorTextScore_list$Word[vectorTextScore_list$Score==-3 | vectorTextScore_list$Score==-2 | vectorTextScore_list$Score==-1]
    posTerms <- vectorTextScore_list$Word[vectorTextScore_list$Score==3 | vectorTextScore_list$Score==2 | vectorTextScore_list$Score==1]
    vPosTerms <- vectorTextScore_list$Word[vectorTextScore_list$Score==5 | vectorTextScore_list$Score==4]
    sentence <- text

    wordList <- str_split(sentence, '\\s+')
    words <- unlist(wordList)
    #build vector with matches between sentence and each category

    vNegMatches <- NULL
    negMatches <- NULL
    posMatches <- NULL
    vPosMatches <- NULL

    vPosMatches <- match(words, vPosTerms)
    posMatches <- match(words, posTerms)
    vNegMatches <- match(words, vNegTerms)
    negMatches <- match(words, negTerms)

    #sum up number of words in each category
    vPosMatches <- sum(!is.na(vPosMatches))
    posMatches <- sum(!is.na(posMatches))
    vNegMatches <- sum(!is.na(vNegMatches))
    negMatches <- sum(!is.na(negMatches))


    score <- c(vNegMatches, negMatches, posMatches, vPosMatches)

    return(score)
    }


    Friday, November 27, 2015 4:38 PM

Answers

  • Hey John,

    Could you clarify how your code works locally in RStudio? When I run your code to test it, it appears that your function SentimentScore is doing this due to the "unlist" call: words <- unlist(wordList). This produces a single vector of words over which the rest of the code is executed.

    If you don't want to change your SentimentScore, a naïve fix would be to change the calling code to be a loop over your text vector:

    outl <- lapply(text_column, function(t) SentimentScore(t, word_list))

    and then flatten the resulting list via some means of row concatenation. One possibility is

    data.frame(do.call("rbind", outl))

    Hope that helps!

    Regards,

    AK

    • Proposed as answer by neerajkh_MSFT Tuesday, December 15, 2015 5:12 PM
    • Marked as answer by neerajkh_MSFT Monday, December 21, 2015 8:03 AM
    Saturday, December 12, 2015 11:18 PM