none
Retail Forecasting example in Azure ML/AI gallery seems to have an error in 6A RRS feed

  • Question

  • I was looking at the Retail Forecasting example in Azure ML/AI gallery. The two deployment scenarios as follows.

    https://gallery.azure.ai/Experiment/370c80490e774a6cb26edba69c583c9b

    https://gallery.azure.ai/Experiment/bef6f84ac80d4625891f9f0ae768b356

    6A is the Time Series ARIMA model and 6B is the regression model. The input data obviously is the same regardless what model you choose.

    However if you look at the web services, the 6A and 6B are expecting different input fields. 6 A is expecting only ID1 and ID2 whereas 6B is expecting ID1,  ID2, Time and Value.

    Attached please find the screenshots that illustrate my points. It is quite obvious that something is wrong in 6A because there is no data for prediction. You  need all four fields.

    Please advise.

    Thanks a million,

    Fred


    FL

    Friday, September 20, 2019 10:50 PM

All replies

  • Hi,

     

    Thank you for raising your question. Please change the connection input to "Dataset2" as shown below. That should resolve the issue. Let me know if you have further questions. Thanks.

     

     

    Regards,

    Azure CXP Community.

    If a post helps to resolve your issue, please click "Mark as Answer" and/or "Vote as helpful". By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.

    Tuesday, September 24, 2019 4:55 PM
    Moderator
  • Hello, Thank you for your response. However, the web service input is already connected to Dataset2. Please take a look at my first screenshot above. Did I miss something ?

    Thanks a million,

    Fred


    FL

    Tuesday, September 24, 2019 5:49 PM
  • Hi,

     

    I am referring to the predictive experiment in 6b. The screenshot you shared shows the webservice input connected to "Dataset1". If you look at the instructions, the webservice input for both the time series and regression models point to "Dataset2". Therefore, ensure that in both models, webservice input is connected to  "Dataset2". Please let me know if you need further clarification. Thanks.

     

     

    Regards,

    Azure CXP Community.

    If a post helps to resolve your issue, please click "Mark as Answer" and/or "Vote as helpful". By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.

    Tuesday, September 24, 2019 10:42 PM
    Moderator
  • Hi, 

    Thank you for  your response. However it looks like connecting web service input to Dataset2 does not work. Please take a look at my screenshot for predictive experiment 6B of 6. This was generated by Azure ML studio. Web service input is connected to Dataset1 (not Dataset2). In fact, if you look at the code in the Execute R script module beneath that says "Extract a time series by ID's", it clearly shows that it expect data for prediction comes from Dataset1 and Dataset2 is only for providing the ID's for filtering the the time series (ID1=2 and ID2=1).

    After I changed the web service input connection on 6A and make it connect to Dataset1, it actually works as expected. If I do not change it and have it still connected to Dataset2, I got an error.

    Please let me know what you think.

    Cheers,

    Fred


    FL

    Thursday, October 3, 2019 7:23 PM
  • Hi,

     

    Thank you for following-up. For deploying the web service, the document shows the web serice input connecting to Dataset2 (ID1 and ID2 as IDs to forecast). Please open and run new experiments (for 6A and 6B) from the gallery. The web service input for both experiements should already connect to Dataset2. Please use the "test preview" option and "enable sample data" to test the web service. Please let me know if you're still experiencing issues after following these steps. Thanks.

     

    Regards,

    GiftA-MSFT.

     

    If a post helps to resolve your issue, please click “Mark as Answer” and/or “Vote as helpful”. By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.



    Friday, October 4, 2019 5:35 PM
    Moderator
  • Hi, Thank you for your reply. Unfortunately, using ID1 and ID2 only as input does not give you any data for prediction. The only input that makes sense is giving it ID1, ID2, time, and value. So for training data, time goes from t1 to t2. Then in deployment, you get forecast for "value" 'for t2+1 to t3 for the same ID1 and ID2. 

    As I said in my previous message.if you look at the code in the Execute R script module beneath that says "Extract a time series by ID's", it clearly shows that it expect data for prediction comes from Dataset1 and Dataset2 is only for providing the ID's for filtering the the time series (ID1=2 and ID2=1).

    Sincerely,

    Fred

    FL

    Sunday, October 6, 2019 6:31 AM
  • Hi,

     

    I think there’s been some misunderstanding about how this model works. I encourage you to review the introduction section and prior steps (particularly Parallelization Consideration, Step 1, and Step 4 sections as shown below) because each step feeds into the next step. There were some considerations that went into play when creating this experiment, hence, only the IDs are used for forecasting to drive efficiency of the model.

     

    Furthermore, in Step 1 of 6, data preprocessing, you can see that we load the time series data (step 1.1) as input and specify modeling parameters (step 1.2). Then in step 1.3 we select time series based on pre-defined business rules and so on. Also, in Steps 4.1 and 4.2, we removed time column because it is not a feature in the regression model.

     

    Please go through each step to fully understand how the model works. This experiment is meant to be a template; however, you can customize it to suit your requirements. Please try to run new experiments using the examples for each step in the gallery, and follow the steps outlined in my previous comment for testing the web service. Please let me know if you experience any issues (Note: I was able to run the experiment(s) successfully). Hope this helps, and as always please feel free to reach out if you have further questions. Thanks.

     

     

    Introduction

     

     

    Step 1

     

     

     

    Step 4

     


     

    Regards,

    GiftA-MSFT.

     

    If a post helps to resolve your issue, please click “Mark as Answer” and/or “Vote as helpful”. By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.






    Monday, October 7, 2019 3:14 PM
    Moderator
  • Hello, Thank you for your prompt reply and pointing out the underlined text. I have actually ran every step of the experiment and examined the code pretty carefully to the point that I think I understand how it works. But of course, I still might have missed something. My problem is that during deployment, using  ID1 and ID2 as input does not seem to work because the input data has to be a time series data. It looks like ID1 and ID2 are just filter parameters to give the model a specific time series for prediction during deployment. So where does the time series data come from if only ID1 and ID2 are input data ? The following is the result of running the request/response code when web service input is connected to Datat Set 1 in the Execute R code module. Input only asks for ID1 and ID2. And I got an error as expected.

    ----------------------------------------------

    > library("RCurl")
    > library("rjson")

    > # Accept SSL certificates issued by public Certificate Authorities
    > options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))

    > h = basicTextGatherer()
    > hdr = basicHeaderGatherer()


    > req = list(
    +   
    +   Inputs = list(
    +     
    +     
    +     "input1" = list(
    +       "ColumnNames" = list("ID1", "ID2"),
    +       "Values" = list( list( "1", "2" ),  list( "3", "1" )  )
    +     )                ),
    +   GlobalParameters = setNames(fromJSON('{}'), character(0))
    + )

    > body = enc2utf8(toJSON(req))
    > api_key = "Yv72E0hLkBbS31nUxjucDOD5SGbVK5i2e7W0Mz7mhIYqVP4DObf0fHw3Dx/l3NoelgZQq2WgyiiZU+MPixgRmA==" # Replace this with the API key for the web service
    > authz_hdr = paste('Bearer', api_key, sep=' ')

    > h$reset()
    > curlPerform(url = "https://ussouthcentral.services.azureml.net/workspaces/ae24c20fadf145f58c9547b6ee5591a9/services/c24b64315747475086b9327c2fb7f0c1/execute?api-version=2.0&details=true",
    +             httpheader=c('Content-Type' = "application/json", 'Authorization' = authz_hdr),
    +             postfields=body,
    +             writefunction = h$update,
    +             headerfunction = hdr$update,
    +             verbose = TRUE
    + )
    *   Trying 13.65.94.53...
    * Connected to ussouthcentral.services.azureml.net (13.65.94.53) port 443 (#0)
    * successfully set certificate verify locations:
    *   CAfile: C:/Users/FL/Documents/R/win-library/3.4/RCurl/CurlSSL/cacert.pem
      CApath: none
    * SSL connection using TLSv1.0 / ECDHE-RSA-AES256-SHA
    * Server certificate:
    * subject: CN=ussouthcentral.services.azureml.net
    * start date: 2018-06-20 19:10:54 GMT
    * expire date: 2020-06-20 19:10:54 GMT
    * subjectAltName: ussouthcentral.services.azureml.net matched
    * issuer: C=US; ST=Washington; L=Redmond; O=Microsoft Corporation; OU=Microsoft IT; CN=Microsoft IT TLS CA 1
    * SSL certificate verify ok.
    > POST /workspaces/ae24c20fadf145f58c9547b6ee5591a9/services/c24b64315747475086b9327c2fb7f0c1/execute?api-version=2.0&details=true HTTP/1.1
    Host: ussouthcentral.services.azureml.net
    Accept: */*
    Content-Type: application/json
    Authorization: Bearer Yv72E0hLkBbS31nUxjucDOD5SGbVK5i2e7W0Mz7mhIYqVP4DObf0fHw3Dx/l3NoelgZQq2WgyiiZU+MPixgRmA==
    Content-Length: 104

    * upload completely sent off: 104 out of 104 bytes
    < HTTP/1.1 400 Bad Request
    < Content-Length: 251
    < Content-Type: application/json; charset=utf-8
    < ETag: "1d416774886c49c8b89905ae307c185f"
    < Server: Microsoft-HTTPAPI/2.0
    < x-ms-request-id: 1bb4a25f-73d3-4071-95a3-02e65bcd36e7
    < Date: Mon, 07 Oct 2019 19:53:17 GMT

    * Connection #0 to host ussouthcentral.services.azureml.net left intact
    OK 
     0 

    > headers = hdr$value()
    > httpStatus = headers["status"]
    > if (httpStatus >= 400)
    + {
    +   print(paste("The request failed with status code:", httpStatus, sep=" "))
    +   
    +   # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    +   print(headers)
    + }
    [1] "The request failed with status code: 400"
                            Content-Length                           Content-Type 
                                     "251"      "application/json; charset=utf-8" 
                                      ETag                                 Server 
    "\"1d416774886c49c8b89905ae307c185f\""                "Microsoft-HTTPAPI/2.0" 
                           x-ms-request-id                                   Date 
    "1bb4a25f-73d3-4071-95a3-02e65bcd36e7"        "Mon, 07 Oct 2019 19:53:17 GMT" 
                                    status                          statusMessage 
                                     "400"                          "Bad Request" 

    > print("Result:")
    [1] "Result:"
    > result = h$value()
    > print(fromJSON(result))
    $error
    $error$code
    [1] "ModuleExecutionError"

    $error$message
    [1] "Module execution encountered an error."

    $error$details
    $error$details[[1]]
    $error$details[[1]]$code
    [1] "21"

    $error$details[[1]]$target
    [1] "Train Model"

    $error$details[[1]]$message
    [1] "Error 0021: Number of rows in input dataset \"Dataset\" is less than allowed minimum of 1 row(s)."


    FL

    Monday, October 7, 2019 8:10 PM
  • Hi,

     

     

    The time series data comes from the input dataset (which includes the IDs) and are fed into the next step where we extract time series by IDs; with an expectation that your data would be updated frequently to enable real-time updates in your solution. Furthermore, I think the issue you're experiencing is because you created a predictive experiment which automatically sets the input source to Dataset1. The tutorial doesn't illustrate how to go about setting up the predictive experiment for this particular use case, hence, you would need to define where the web service will accept input and where it will generate output. For the purpose of this experiment, I suggest that you connect the web service input to Dataset2 when deploying the model and use the output schema (presented in the tutorial) to define the output that would be generated. I noticed that deploying using the training experiment (i.e. without creating a predictive experiment) produces the expected results for this use case. However, you may need to modify the experiment when deploying as predictive experiment. Please feel free to review the following resources regarding setting up predictive experiments (document1, document2, document3). Hope this helps. Thanks.

     

     

    Regards,

    GiftA-MSFT.

     

    If a post helps to resolve your issue, please click “Mark as Answer” and/or “Vote as helpful”. By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.


    Tuesday, October 8, 2019 8:54 PM
    Moderator
  • Hi , Thank you for your reply. In this step 6A of 6, the web service input is connected to Dataset 2.  The problem is if you try to run the request/response deployment code, it expects only ID1 and ID2 as input data. But in a deployment scenario, you also need the time series data. So, my question is: How/where do we give the time series data as input ? As you can see in the code in Execute R script module (below) that connects to Dataset 2, ID1 and ID2 are hard coded. An additional question is: how do we make it dynamic based on some input ?

    Thanks a million,

    Fred

    ## ------- User-Defined Parameters ------ ##
    ID1 <- 2
    ID2 <- 1
    ## ----------------------------------------- ##

    IDinput<- data.frame(ID1 = ID1, ID2 = ID2, stringsAsFactors = FALSE)

    # Select data.frame to be sent to the output Dataset port
    maml.mapOutputPort("IDinput");

    Thanks


    FL

    Wednesday, October 9, 2019 1:07 AM
  • Hi,

     

    As I mentioned earlier, based on this scenario, the time series data is also provided as input and fed into the next step where we extract time series by IDs. To make it dynamic, it really depends on your use case and how you plan to set up your data pipeline. However, using online data sources would enable real-time updates, thereby, providing new data for prediction; and using cloud storage would ensure that intermediate datasets shared between experiments are always updated with the newest version. Please read the data pipeline section of the document for more details. Hope this helps. Thanks.

     

     

     

    Regards,

    GiftA-MSFT.

     

    If a post helps to resolve your issue, please click “Mark as Answer” and/or “Vote as helpful”. By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.


    Wednesday, October 9, 2019 2:22 PM
    Moderator
  • Hi , Thank you again for your fast response and your patience. I think what you said is that time series data as well as the ID's are provided as input. However based on the 6A of 6 experiment as you showed here, the following is the code generated for deploying the web service in R. My question is: It only shows that it's taking ID1 and ID2 as  input, so how do we feed the time series data as input ? Please show me the correct code. Thanks a million.

    -----------------------------------------

    library("RCurl")
    library("rjson")
    
    # Accept SSL certificates issued by public Certificate Authorities
    options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
    
    h = basicTextGatherer()
    hdr = basicHeaderGatherer()
    
    
    req = list(
    
            Inputs = list(
    
     
                "input1" = list(
                    "ColumnNames" = list("ID1", "ID2"),
                    "Values" = list( list( "0", "0" ),  list( "0", "0" )  )
                )                ),
            GlobalParameters = setNames(fromJSON('{}'), character(0))
    )
    
    body = enc2utf8(toJSON(req))
    api_key = "abc123" # Replace this with the API key for the web service
    authz_hdr = paste('Bearer', api_key, sep=' ')
    
    h$reset()
    curlPerform(url = "https://ussouthcentral.services.azureml.net/workspaces/22e33c1c14754eb195bbc15f032c0949/services/2ee0111f8d89465a910e8a83573daad8/execute?api-version=2.0&details=true",
                httpheader=c('Content-Type' = "application/json", 'Authorization' = authz_hdr),
                postfields=body,
                writefunction = h$update,
                headerfunction = hdr$update,
                verbose = TRUE
                )
    
    headers = hdr$value()
    httpStatus = headers["status"]
    if (httpStatus >= 400)
    {
        print(paste("The request failed with status code:", httpStatus, sep=" "))
    
        # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
        print(headers)
    }
    
    print("Result:")
    result = h$value()
    print(fromJSON(result))
     
    


    FL

    Wednesday, October 9, 2019 7:09 PM
  • Hi,

     

    Based on this tutorial, you feed the time series data using the “import data” module as shown in my previous comment. If you change the web service input source to Dataset1, you’ll notice that the sample code also changes. The sample code is there for you to use when connecting to the machine learning web service using programming language that supports HTTP request and response. Hope this helps. Thanks.

     

    Regards,

    GiftA-MSFT.

     

    If a post helps to resolve your issue, please click “Mark as Answer” and/or “Vote as helpful”. By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.

    Wednesday, October 9, 2019 10:02 PM
    Moderator
  • Hi, Thanks again for your prompt reply.So in order to get time series data for deployment, you agree that web service input needs to connect to Dataset1 of the Execute R script module right ? I think that answered my original question if that is the case. In fact connecting web service input to Dataset2 of the Execute R script module, you cannot get time series data as input for deployment.

    Kind Regards,

    Fred


    FL

    Thursday, October 10, 2019 1:17 AM
  • Hi,

     

    I was only saying that to explain the purpose of the sample code. You can connect to Dataset1 based on your requirements and data pipeline, but for this tutorial, we are expected to connect the web service input to Dataset2. Hope this helps. Thanks.

     

    Regards,

    GiftA-MSFT.

     

    If a post helps to resolve your issue, please click “Mark as Answer” and/or “Vote as helpful”. By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.

    Thursday, October 10, 2019 5:31 PM
    Moderator
  • Hi , I understand what you are saying. However, if you connect the web service input to Dataset2, it expects ID1 and ID2 as input data. And that actually does not work as you can see in the result of deploying the code in my previous message.

    Regards,

    Fred


    FL

    Thursday, October 10, 2019 8:18 PM
  • Hi,

     

    Are you getting errors for 6a and/or 6b when web service input is connected to Dataset2? Here are my results from running and deploying the experiments. Thanks.

     

    6a Results

     

     

    6b Results

     

     

    Regards,

    GiftA-MSFT.

     

    If a post helps to resolve your issue, please click “Mark as Answer” and/or “Vote as helpful”. By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.



    Friday, October 11, 2019 2:24 PM
    Moderator
  • Hi, Thank you for your reply. Yes, I got the error when web service input was connected to Dataset2. I don't know where you were running the web service. But please try to run the code generated by the request/response tab (please see screenshot) as shown below. Please insert your API key and correct url. Thanks a million.

    ----------------------------

    
    library("RCurl")
    library("rjson")
    
    # Accept SSL certificates issued by public Certificate Authorities
    options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
    
    h = basicTextGatherer()
    hdr = basicHeaderGatherer()
    
    
    req = list(
    
            Inputs = list(
    
     
                "input1" = list(
                    "ColumnNames" = list("ID1", "ID2"),
                    "Values" = list( list( "0", "0" ),  list( "0", "0" )  )
                )                ),
            GlobalParameters = setNames(fromJSON('{}'), character(0))
    )
    
    body = enc2utf8(toJSON(req))
    api_key = "abc123" # Replace this with the API key for the web service
    authz_hdr = paste('Bearer', api_key, sep=' ')
    
    h$reset()
    curlPerform(url = "https://ussouthcentral.services.azureml.net/workspaces/22e33c1c14754eb195bbc15f032c0949/services/2ee0111f8d89465a910e8a83573daad8/execute?api-version=2.0&details=true",
                httpheader=c('Content-Type' = "application/json", 'Authorization' = authz_hdr),
                postfields=body,
                writefunction = h$update,
                headerfunction = hdr$update,
                verbose = TRUE
                )
    
    headers = hdr$value()
    httpStatus = headers["status"]
    if (httpStatus >= 400)
    {
        print(paste("The request failed with status code:", httpStatus, sep=" "))
    
        # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
        print(headers)
    }
    
    print("Result:")
    result = h$value()
    print(fromJSON(result))
     
    


    FL

    Friday, October 11, 2019 9:06 PM
  • Hi,

     

    The error you're receiving is as a result of entering incorrect input and invalid (url and/or API key). You can test the web service using the “Test preview” link next to the "Test" button with sample data enabled. Alternatively, to consume the web service using R, please follow these steps:

     

    1. Go to https://services.azureml.net/classicWebservices/
    2. Select the web service you are working on
    3. Select the endpoint (default endpoint)
    4. Click “Consume” tab. Here you will find your API key as well as the sample R code (with correct input and url)
    5. Update the code with your API key

     

    #This is what your input should look like for this tutorial
    Inputs = list("input1"= list(list('ID1' = "2", 'ID2' = "1")))

     

    Note: I was able to test the web service successfully in my R environment. Hopefully this guide helps to resolve the issue you're experiencing. Please review the following documentation on How to Test Your Classic Web Service and How to consume an Azure Machine Learning Studio Web Service as you continue to explore Azure ML service. Thanks.

     

     

    Regards,

    GiftA-MSFT.

     

    If a post helps to resolve your issue, please click “Mark as Answer” and/or “Vote as helpful”. By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.


    Saturday, October 12, 2019 12:03 AM
    Moderator
  • Hi, I think I see what you are saying. In this deployment of web service, the time series data actually is still coming from the Azure Blob Storage automatically, i.e. from Dataset1. The web service input connected to Dataset2 provides the values for ID1 and ID2. In fact, the input values from web service input overrides the code in the Execute R script which also feed into Dataset2. This is the only explanation that makes sense.

    So, if I have my own time series data and I want to deploy the web service as is, I have to change the data source for Dataset1, e.g., I might read data from Azure SQL then I have to change the "Import Data" module.

    I read some document in the past that Dataset1 only provides the input data schema. It looks to me that it's not the case. It actually reads input data for deployment in this case. Maybe there are other deployment scenarios that data for deployment is not provided as Dataset1 ? If this is true, how do you control the input data requirements, i.e. when do actually expect data from Dataset1 or only schema from Dataset1 ? What about Dataset2? Do you always require data from Dataset2 ?

    Thanks,

    Fred


    FL


    • Edited by FL_QB Tuesday, October 15, 2019 11:12 PM
    Tuesday, October 15, 2019 11:10 PM