locked
Retraining RRS feed

  • Question

  • Hi, Ive got a problem with retraining my model. Ive been following this site 

    https://azure.microsoft.com/en-gb/documentation/articles/machine-learning-retrain-models-programmatically/

    And have performed the following;

    1. Created a model
    2. Created a predictive experiment and published it as a web service
    3. Created a retraining experiment and published it as a web service

    I did not change the web service endpoint names or anything, i left everything as default.

    I selected the retraining web service, and selected BATCH EXECUTION

    Using Fiddler i constructed a post request using the body described in the BATCH EXECUTION page (modified my storage account endpoint and set output2results.csv to a .ilearner suffix)  I was able to submit a job and retrieved the job id.  I started the job, and retrieved the jobs status.

    I get this;

    {"StatusCode":"Failed","Results":null,"Details":"Process exited with error code -2\nRecord Starts at UTC 08/26/2015 09:15:46:\n\n[ModuleOutput] Error: Sorry, it seems that you have encountered an internal system error. Please contact amlforum@microsoft.com with the full URL in the browser and the time you experienced the failure. We can locate this error with your help and investigate further. Thank you.\n\nRecord Ends at UTC 08/26/2015 09:15:46.\n\n"}

    Also, the guide also mentions the need to create an updatable endpoint using the portal.

    I navigated to the appropriate experiment/web service, selected ADD ENDPOINT.  I get 

    Please try again. If the problem persists, contact support.

    When i view the overview page the number of endpoints has increased, however, no new endpoint appears in the details page..

    Has something changed from the doc in how retraining is accomplished?

    Wednesday, August 26, 2015 9:43 AM

Answers

  • Hi David,

    Looking at the sample request you have shared, the issue seems to be with the definition of the input blob. We expect that blob to be specified with a supported extension. In this case, since the content is a trained model, the expected extension is .ilearner. So you should specify that blob as uploadedresources/part-00000.ilearner

    For datasets, we allow .csv, .arff and .tsv

    The fact that this is not erroring out with a BadRequest at job submission is a bug that we are fixing and will ship with the next service release.

    Please let us know if the problem persists after making this fix in the request.

    Thanks,
    Tudor

    • Proposed as answer by neerajkh_MSFT Sunday, September 13, 2015 4:42 PM
    • Marked as answer by neerajkh_MSFT Tuesday, September 15, 2015 6:06 AM
    Friday, September 11, 2015 6:44 PM

All replies

  • Hi David,

    Could you tell me your workspace Id?

    Wednesday, August 26, 2015 6:30 PM
  • Hi David,

    Is the input file which is passed with your BES service having a file extension? Could you share your request payload to BES? (Please remove the storage key field from the payload.)

    Thanks,

    Hongye

    Wednesday, August 26, 2015 6:50 PM
  • Hi, thanks for the reply.

    Ive retried the same thing in a few workspaces.  

    In this workspace i did succeed in retraining, but failed to add the updatable endpoint

    4f831125daa54c3e9612910e11f32045

    In these workspaces i could not get it to retrain at all, nor add endpoints

    3de3a4b39acc4b6f807ae3024fd7598d
    9e63d30ec5324e95b25efe13a140b1f6

    i changed workspaces to see if i could get the endpoints to work if i started from scratch.  The second of these was created after trying to create endpoints in the first, after which i could no longer delete the web services, i got the error

    Service call failed. Error 500 (Internal Service Error.) when requesting /webservicegroups/3bb8929021d0418ca5bb3253ec00d4a8

    regarding BES payload, in the first workspace i had renamed the webservice endpoints to retrain/prod (these i created manaually before deploying the web service), so the request body was;


    "GlobalParameters": {}, 
    "Input": { "ConnectionString": "DefaultEndpointsProtocol=https;AccountName=name;AccountKey=key", "RelativeLocation": "uploadedresources/part-00000" }, 
    "Outputs": { "prod": { "ConnectionString": "DefaultEndpointsProtocol=https;AccountName=AccountName=name;AccountKey=key", "RelativeLocation": "uploadedresources/prodresults.csv" }, 
    "retrain": { "ConnectionString": "DefaultEndpointsProtocol=https;AccountName=name;AccountKey=key", "RelativeLocation": "uploadedresources/retrainresults.ilearner" } } }

    and the headers were (Fiddler worked out the Content-Length and added this automatically, so that was set in the request)

    User-Agent: Fiddler
    Authorization: Bearer xxxx
    Host: ussouthcentral.services.azureml.net
    Content-Type:application/json
    .
    I posted to the URI detailed in the BATCH EXECUTION link

    In my other tests, i explicitly selected to create a retraining web service.  I created the web service input/outputs, i left the naming of the outputs as default.  I copied the sample payload from the BATCH EXECUTION page, renamed the output from .csv to ilearner, everything else was as described above.

    Regards
    David

    Thursday, August 27, 2015 8:45 AM
  • Hi Hongye

    Just wondering if you investigated this further?

    Thanks

    David

    Wednesday, September 2, 2015 12:39 PM
  • Pretty sure this hasnt been answered yet :)
    Friday, September 4, 2015 9:16 AM
  • Hi David,

    Looking at the sample request you have shared, the issue seems to be with the definition of the input blob. We expect that blob to be specified with a supported extension. In this case, since the content is a trained model, the expected extension is .ilearner. So you should specify that blob as uploadedresources/part-00000.ilearner

    For datasets, we allow .csv, .arff and .tsv

    The fact that this is not erroring out with a BadRequest at job submission is a bug that we are fixing and will ship with the next service release.

    Please let us know if the problem persists after making this fix in the request.

    Thanks,
    Tudor

    • Proposed as answer by neerajkh_MSFT Sunday, September 13, 2015 4:42 PM
    • Marked as answer by neerajkh_MSFT Tuesday, September 15, 2015 6:06 AM
    Friday, September 11, 2015 6:44 PM