none
Predict result is different in R script in Azure ML Studio vs R Studio

    Question

  • We have a trained XGBoost model that is saved. To save time on training, we decided to train locally and load the model to the studio for prediction and for potential API calls. 

    Since current Azure ML Studio can use MRO 3.4.4, I made sure my R Studio uses the same R version as well.

    I'm using the same datasets on both Az ML Studio and RStudio, and same scripts, except where we pull in the input database on R Scripts.

    Then I load the xgb model the same way on both ends, and when I run predict and I end up getting different results.

    Does anybody know why?

    The versions of libraries I'm using shouldn't matter because I'm not training or anything. I'm just using the default predict() function given in R environment, without necessity to call out the library. 

    Please advise. 

    Friday, November 2, 2018 6:00 PM

All replies

  • Hi,

    Can you please share the version of libraries/R studio you are using? And also how is the difference? 

    Regards,

    Yutong

    Friday, November 2, 2018 7:33 PM
    Moderator
  • The versions in R Studio:

    sessionInfo()
    R version 3.4.4 (2018-03-15)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    Running under: Windows >= 8 x64 (build 9200)

    Matrix products: default

    locale:
    [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
    [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     

    other attached packages:
    [1] xgboost_0.6.4.1  dplyr_0.7.4      RevoUtils_10.0.9

    loaded via a namespace (and not attached):
     [1] Rcpp_0.12.19        lattice_0.20-35     assertthat_0.2.0    R6_2.3.0            grid_3.4.4         
     [6] gtable_0.2.0        magrittr_1.5        pillar_1.0.1        rlang_0.3.0.1       stringi_1.1.6      
    [11] data.table_1.10.4-3 bindrcpp_0.2.2      Matrix_1.2-12       tools_3.4.4         glue_1.2.0         
    [16] yaml_2.1.16         compiler_3.4.4      pkgconfig_2.0.1     bindr_0.1.1         tibble_1.4.1       

    R Studio predict value of top 10 rows of the same datasets with same model:

    1        64.04490<o:p></o:p>
    2        66.34637<o:p></o:p>
    3        62.47011<o:p></o:p>
    4        63.04543<o:p></o:p>
    5        53.50597<o:p></o:p>
    6        70.89996<o:p></o:p>
    7        71.27137<o:p></o:p>
    8        62.28897<o:p></o:p>
    9        51.83804<o:p></o:p>
    10       59.49471<o:p></o:p>

    AZ Studio predict value of top 10 rows:

    1 64.348785

    2 68.361885

    3 63.168068

    4 66.251572

    5 50.753403

    6 70.750008

    7 66.660683

    8 61.739677

    9 52.319084

    10 60.339882

    Also just so you know, since AZ studio's R script doesn't come with xgboost or magrittr as default, I uploaded the zipped file to load them.

    And they use following versions:

    xgboost_0.71.2

    magrittr_1.5

    Monday, November 5, 2018 11:17 PM
  • Any news on this one?

    We need help ASAP. 

    Wednesday, November 7, 2018 5:08 PM
  • Any update on this? 
    Friday, November 9, 2018 6:51 PM
  • Hi,

    Sorry for the response delay, I have forwarded this issue to product team to investigate more. I will get back to you once I have any update. Thanks!

    Regards,

    Yutong

    Tuesday, November 20, 2018 7:54 AM
    Moderator
  • Thanks.

    Any update on this one?

    Tuesday, November 27, 2018 9:38 PM
  • No for now, still track it. I will get back to you as soon as possible. 

    Thanks,

    Yutong

    Friday, November 30, 2018 3:54 PM
    Moderator
  • Hi Yutong,

    Any update on this one?

    Friday, December 7, 2018 8:40 PM
  • I recently had the same issue. Here was my problem. Are you using execute r script? Check the data frame ( the test data) which is being feed as input to your r script. the order of columns should be exactly the same as the order of columns you used while training the model locally in R studio. If the orders are not the same you will get different results. It seems that the xgb model in R cares about the order as well. 
    Thursday, December 13, 2018 3:19 PM
  • hi MJ Jung,

    Sorry about the issue you are having and the long wait.  If the dataset is not confidential would you be able to publish it to gallery in unlisted mode?  We could take a look.

    Thank you, Ilya

    Monday, January 7, 2019 9:24 PM
  • Hi Ilya,

    The database is indeed confidential that I can't publicly publish.

    I will go through the example again and see if the issue persists..

    Tuesday, January 8, 2019 4:39 PM