none
Deployment Azure ML model in (ACI container) VS (AKS kubernetes) RRS feed

  • Question

  • Hello Colleagues, 

    I am successfully able to start quite standard trained ML model in ACI with following standard config:

    from azureml.core.conda_dependencies import CondaDependencies myenv = CondaDependencies() myenv.add_conda_package("scikit-learn") myenv.add_pip_package("azureml-defaults")  myenv.add_pip_package("fasttext") with open("myenv.yml", "w") as f:     f.write(myenv.serialize_to_string())

    %%time from azureml.core.webservice import Webservice from azureml.core.model import InferenceConfig from azureml.core.environment import Environment from azureml.core.model import Model

    from azureml.core.webservice import AciWebservice aciconfig = AciWebservice.deploy_configuration(auth_enabled=True)

    myenv = Environment.from_conda_specification(name="myenv", file_path="myenv.yml") inference_config = InferenceConfig(entry_script='score.py', environment=myenv, source_directory=".")  service = Model.deploy(workspace=ws,                        name='myserv',                        models=[model1],                         inference_config=inference_config,                        deployment_config=aciconfig) service.wait_for_deployment(show_output=True)

    It starts successfully, everything is ok. 

    However, when I try to put same ML model to AKS:

    %%time from azureml.core.webservice import Webservice from azureml.core.model import InferenceConfig from azureml.core.environment import Environment from azureml.core.webservice import AksWebservice #new from azureml.core.model import Model from azureml.core.compute import AksCompute, ComputeTarget myenv = Environment.from_conda_specification(name="myenv", file_path="myenv.yml") inference_config = InferenceConfig(entry_script='score.py', environment=myenv, source_directory=".")  aks_target = AksCompute(ws,"endpointfigl") deployment_config = AksWebservice.deploy_configuration(cpu_cores = 2, memory_gb = 2)# 2Nodes (Standard D3 v2) running service = Model.deploy(ws, "myservice", [model1], inference_config, deployment_config, aks_target) service.wait_for_deployment(show_output = True) print(service.state) print(service.get_logs())

    it gives me timeout 504 error 

    with following print(service.get_logs()):

    Invoking user's init function

    2020-10-01 00:29:22,677 | azureml.core.run | DEBUG | Could not load run context RunEnvironmentException:

           Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run.

           InnerException None

           ErrorResponse

    {

        "error": {

            "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run."

        }

    }, switching offline: False

    2020-10-01 00:29:22,677 | azureml.core.run | DEBUG | Could not load the run context and allow_offline set to False

    2020-10-01 00:29:22,677 | azureml.core.model | DEBUG | version is None. Latest version is 22

    2020-10-01 00:29:22,677 | azureml.core.model | DEBUG | Found model path at azureml-models/my_model/22/finalized_model.sav

    Users's init has completed successfully

    /azureml-envs/azureml_0429a163f5fb308289e31ffe48160e35/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.

      warnings.warn(msg, category=FutureWarning)

    /azureml-envs/azureml_0429a163f5fb308289e31ffe48160e35/lib/python3.6/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator CountVectorizer from version 0.20.3 when using version 0.22.2.post1. This might lead to breaking code or invalid results. Use at your own risk.

      UserWarning)

    /azureml-envs/azureml_0429a163f5fb308289e31ffe48160e35/lib/python3.6/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator TfidfTransformer from version 0.20.3 when using version 0.22.2.post1. This might lead to breaking code or invalid results. Use at your own risk.

      UserWarning)

    /azureml-envs/azureml_0429a163f5fb308289e31ffe48160e35/lib/python3.6/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator ComplementNB from version 0.20.3 when using version 0.22.2.post1. This might lead to breaking code or invalid results. Use at your own risk.

      UserWarning)

    /azureml-envs/azureml_0429a163f5fb308289e31ffe48160e35/lib/python3.6/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator Pipeline from version 0.20.3 when using version 0.22.2.post1. This might lead to breaking code or invalid results. Use at your own risk.

      UserWarning)

    Scoring timeout setting is not found. Use default timeout: 3600000 ms

    Please, help, what I am doing wrong? I already import joblib directly in my init(): import joblib (as per first debug notice)

    Thursday, October 1, 2020 1:00 AM

All replies

  • i changed a little "myenv.yml" file:

    from azureml.core.conda_dependencies import CondaDependencies

    myenv = CondaDependencies()
    myenv.add_conda_package("scikit-learn")
    myenv.add_pip_package("azureml-defaults>= 1.0.45")
    myenv.add_pip_package("joblib")

    with open("myenv.yml", "w") as f:
        f.write(myenv.serialize_to_string())

    But still getting following error when deploying to AKS:

    /bin/bash: /azureml-envs/azureml_70d609b171aa9165982084311f9a2a21/lib/libtinfo.so.5: no version information available (required by /bin/bash) /bin/bash: /azureml-envs/azureml_70d609b171aa9165982084311f9a2a21/lib/libtinfo.so.5: no version information available (required by /bin/bash) /bin/bash: /azureml-envs/azureml_70d609b171aa9165982084311f9a2a21/lib/libtinfo.so.5: no version information available (required by /bin/bash) 2020-10-03T14:42:40,050107619+00:00 - rsyslog/run /bin/bash: /azureml-envs/azureml_70d609b171aa9165982084311f9a2a21/lib/libtinfo.so.5: no version information available (required by /bin/bash) bash: /azureml-envs/azureml_70d609b171aa9165982084311f9a2a21/lib/libtinfo.so.5: no version information available (required by bash) 2020-10-03T14:42:40,051966626+00:00 - iot-server/run /usr/sbin/nginx: /azureml-envs/azureml_70d609b171aa9165982084311f9a2a21/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx) /usr/sbin/nginx: /azureml-envs/azureml_70d609b171aa9165982084311f9a2a21/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx) /usr/sbin/nginx: /azureml-envs/azureml_70d609b171aa9165982084311f9a2a21/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx) /usr/sbin/nginx: /azureml-envs/azureml_70d609b171aa9165982084311f9a2a21/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx) /usr/sbin/nginx: /azureml-envs/azureml_70d609b171aa9165982084311f9a2a21/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx) 2020-10-03T14:42:40,053661433+00:00 - gunicorn/run 2020-10-03T14:42:40,053839634+00:00 - nginx/run EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting... /bin/bash: /azureml-envs/azureml_70d609b171aa9165982084311f9a2a21/lib/libtinfo.so.5: no version information available (required by /bin/bash) 2020-10-03T14:42:40,127924025+00:00 - iot-server/finish 1 0 2020-10-03T14:42:40,129168630+00:00 - Exit code 1 is normal. Not restarting iot-server. Starting gunicorn 19.9.0 Listening at: http://127.0.0.1:31311 (10) Using worker: sync worker timeout is set to 300 Booting worker with pid: 39

    Help, please


    • Edited by Misha123457 Saturday, October 3, 2020 2:49 PM correction
    Saturday, October 3, 2020 2:48 PM
  • if I deploy this model locally- it runs starts successfully. I quess, the problem with resources allocation..

    from azureml.core.environment import Environment
    from azureml.core.model import InferenceConfig, Model
    from azureml.core.webservice import LocalWebservice


    # Create inference configuration based on the environment definition and the entry script
    myenv = Environment.from_conda_specification(name="myenv", file_path="myenv.yml")
    inference_config = InferenceConfig(entry_script="score.py", environment=myenv, source_directory=".")
    # Create a local deployment, using port 8890 for the web service endpoint
    deployment_config = LocalWebservice.deploy_configuration(port=8890)
    # Deploy the service
    service = Model.deploy(
        ws, "mymodel", [model3], inference_config, deployment_config)
    # Wait for the deployment to complete
    service.wait_for_deployment(True)
    # Display the port that the web service is available on
    print(service.port)

    this code works fine...

    Monday, October 5, 2020 12:08 AM