Errors when registering a Dataset to workspace RRS feed

  • Question

  • Hi

    I am experimenting using Azure ML services (azureml-sdk==1.0.62). 

    What I am doing is to create a pipeline that container data preprocessing, training, testing, deploying and monitoring. Everything worked fine when I ran locally.

    Now I committed to repo and the pipeline is triggered when there are changes in codes. In the CICD, the same pipeline is executed 4 times simultaneously to test stability.  It worked fine in the beginning. But in the second run, I got 

    Version 7 has been registered as iris_dataset:3 (name:version).

    So I deleted the dataset and repeated the ran. I observed the similar outcomes, i.e, first time is okey and the second time, I got similar error 

    Version 3 has been registered as iris_dataset:1 (name:version).

    When I check the first run, all pipeline registered model to dataset:1.  It should be something like dataset:1, dataset:2, ..  

    Can someone know what happened ?

    The experiment failed. Finalizing run...
    Logging experiment finalizing status in history service.
    Cleaning up all outstanding Run operations, waiting 300.0 seconds
    2 items cleaning up...
    Cleanup took 0.0015828609466552734 seconds
    Traceback (most recent call last):
      File "data_preparation.py", line 34, in <module>
      File "data_preparation.py", line 22, in register_data
      File "/opt/hostedtoolcache/Python/3.6.9/x64/lib/python3.6/site-packages/azureml/data/_loggerfactory.py", line 77, in wrapper
        return func(*args, **kwargs)
      File "/opt/hostedtoolcache/Python/3.6.9/x64/lib/python3.6/site-packages/azureml/data/_dataset.py", line 277, in register
        raise e  # TODO: log unknown exception
      File "/opt/hostedtoolcache/Python/3.6.9/x64/lib/python3.6/site-packages/azureml/data/_dataset.py", line 256, in register
      File "/opt/hostedtoolcache/Python/3.6.9/x64/lib/python3.6/site-packages/azureml/_restclient/operations/dataset_operations.py", line 1729, in register
        raise HttpOperationError(self._deserialize, response)
    msrest.exceptions.HttpOperationError: Operation returned an invalid status code 'Version 7 has been registered as iris_dataset:3 (name:version).'

    • Edited by Chengyu Liu Friday, November 15, 2019 2:28 PM
    Friday, November 15, 2019 2:20 PM

All replies

  • I continued investigation.

    What I did is to run the same pipeline locally meaning everything is the same, the workspace, dataset name, etc.

    I have the same error coming out. Somehow the registered dataset is in a not correct stage.

    Friday, November 15, 2019 2:28 PM
  • Hello Chengyu,

    Could you please let us know if you are following any documentation link to run the above pipeline experiment to replicate the scenario you are facing?

    On a high level it looks like the previous runs registered the dataset and the newer runs are failing with this step. 


    Monday, November 18, 2019 12:44 PM
  • Hi Chunyu,

    Is data_preparation.py in those pipeline run always produce data to fixed locations, so that the dataset to register may reference the same data location which may had been registered in other dataset before?

    If so, it can explain the error like "has been registered as...". It is not intuitive in dataset register API that:

    When calling ds.register('iris_dataset', create_new_version=True), if there is other registration of dataset being identical to "ds", it will fail with "has been registered".

    As for version number sometimes not bumped, it may be caused by that the dataset to register is identical to the latest version.

    Could you share the workflow/code of in data_preparation.py? Thanks.

    Wednesday, December 18, 2019 6:52 AM