custom docker- image build failed, conda not found RRS feed

  • Question

  • I'm trying to run an experiment on my compute target using GPUs. I submit the estimator below and get a "Run failed. Image build failed" error on the experiment run. The error log is included below and looks like "conda not found" is the issue.

    from azureml.train.estimator import Estimator

    script_params = {
        '--data-folder': ds.as_mount()

    est = Estimator(source_directory='.',
                       script_params = script_params,


    error log:

    2019/10/31 20:16:35 Downloading source code...
    2019/10/31 20:16:37 Finished downloading source code
    2019/10/31 20:16:40 Creating Docker network: acb_default_network, driver: 'bridge'
    2019/10/31 20:16:42 Successfully set up Docker network: acb_default_network
    2019/10/31 20:16:42 Setting up Docker configuration...
    2019/10/31 20:16:42 Successfully set up Docker configuration
    2019/10/31 20:16:42 Logging in to registry: foodmlwsfaster2491836064.azurecr.io
    2019/10/31 20:16:44 Successfully logged into foodmlwsfaster2491836064.azurecr.io
    2019/10/31 20:16:44 Executing step ID: acb_step_0. Timeout(sec): 5400, Working directory: '', Network: 'acb_default_network'
    2019/10/31 20:16:44 Scanning for dependencies...
    2019/10/31 20:16:44 Successfully scanned dependencies
    2019/10/31 20:16:44 Launching container with name: acb_step_0
    Sending build context to Docker daemon  59.39kB

    Step 1/14 : FROM tensorflow/tensorflow:1.12.0-gpu-py3@sha256:84f0820e151b129c63ac15c6d9c1c5336a834070dca22a271c7de091d490a17f
    sha256:84f0820e151b129c63ac15c6d9c1c5336a834070dca22a271c7de091d490a17f: Pulling from tensorflow/tensorflow
    18d680d61657: Pulling fs layer
    0addb6fece63: Pulling fs layer
    78e58219b215: Pulling fs layer
    eb6959a66df2: Pulling fs layer
    e3eb30fe4844: Pulling fs layer
    852c9b7a4425: Pulling fs layer
    0a298bf31111: Pulling fs layer
    f43ecd71dda8: Pulling fs layer
    9f554feaeba1: Pulling fs layer
    abf1fc85d970: Pulling fs layer
    3e67c4ad17bb: Pulling fs layer
    c60e8159f45c: Pulling fs layer
    2b01db739666: Pulling fs layer
    1553de0cb9ac: Pulling fs layer
    1ed5c01b0218: Pulling fs layer
    9913722703a5: Pulling fs layer
    442335dc9a85: Pulling fs layer
    eb6959a66df2: Waiting
    e3eb30fe4844: Waiting
    852c9b7a4425: Waiting
    0a298bf31111: Waiting
    f43ecd71dda8: Waiting
    9f554feaeba1: Waiting
    abf1fc85d970: Waiting
    3e67c4ad17bb: Waiting
    c60e8159f45c: Waiting
    2b01db739666: Waiting
    1553de0cb9ac: Waiting
    1ed5c01b0218: Waiting
    9913722703a5: Waiting
    442335dc9a85: Waiting
    0addb6fece63: Verifying Checksum
    0addb6fece63: Download complete
    78e58219b215: Verifying Checksum
    78e58219b215: Download complete
    eb6959a66df2: Verifying Checksum
    eb6959a66df2: Download complete
    e3eb30fe4844: Verifying Checksum
    e3eb30fe4844: Download complete
    18d680d61657: Verifying Checksum
    18d680d61657: Download complete
    852c9b7a4425: Verifying Checksum
    852c9b7a4425: Download complete
    0a298bf31111: Verifying Checksum
    0a298bf31111: Download complete
    abf1fc85d970: Verifying Checksum
    abf1fc85d970: Download complete
    9f554feaeba1: Verifying Checksum
    9f554feaeba1: Download complete
    3e67c4ad17bb: Verifying Checksum
    3e67c4ad17bb: Download complete
    c60e8159f45c: Verifying Checksum
    c60e8159f45c: Download complete
    1553de0cb9ac: Verifying Checksum
    1553de0cb9ac: Download complete
    2b01db739666: Verifying Checksum
    2b01db739666: Download complete
    1ed5c01b0218: Verifying Checksum
    1ed5c01b0218: Download complete
    9913722703a5: Verifying Checksum
    9913722703a5: Download complete
    442335dc9a85: Verifying Checksum
    442335dc9a85: Download complete
    f43ecd71dda8: Verifying Checksum
    f43ecd71dda8: Download complete
    18d680d61657: Pull complete
    0addb6fece63: Pull complete
    78e58219b215: Pull complete
    eb6959a66df2: Pull complete
    e3eb30fe4844: Pull complete
    852c9b7a4425: Pull complete
    0a298bf31111: Pull complete
    f43ecd71dda8: Pull complete
    9f554feaeba1: Pull complete
    abf1fc85d970: Pull complete
    3e67c4ad17bb: Pull complete
    c60e8159f45c: Pull complete
    2b01db739666: Pull complete
    1553de0cb9ac: Pull complete
    1ed5c01b0218: Pull complete
    9913722703a5: Pull complete
    442335dc9a85: Pull complete
    Digest: sha256:84f0820e151b129c63ac15c6d9c1c5336a834070dca22a271c7de091d490a17f
    Status: Downloaded newer image for tensorflow/tensorflow:1.12.0-gpu-py3@sha256:84f0820e151b129c63ac15c6d9c1c5336a834070dca22a271c7de091d490a17f
     ---> 413b9533f92a
    Step 2/14 : USER root
     ---> Running in abffe211e7a7
    Removing intermediate container abffe211e7a7
     ---> 7010ebf72961
    Step 3/14 : RUN mkdir -p $HOME/.cache
     ---> Running in 941da7a1a380
    Removing intermediate container 941da7a1a380
     ---> aa7d13d1851a
    Step 4/14 : WORKDIR /
     ---> Running in 411eabab64d4
    Removing intermediate container 411eabab64d4
     ---> 5d8b8b7c3066
    Step 5/14 : COPY azureml-environment-setup/99brokenproxy /etc/apt/apt.conf.d/
     ---> 29b3f77c11af
    Step 6/14 : RUN if dpkg --compare-versions `conda --version | grep -oE '[^ ]+$'` lt 4.4.11; then conda install conda==4.4.11; fi
     ---> Running in b7001a082671
    /bin/sh: 1: conda: not found
    dpkg: error: --compare-versions takes three arguments: <version> <relation> <version>

    Type dpkg --help for help about installing and deinstalling packages [*];
    Use 'apt' or 'aptitude' for user-friendly package management;
    Type dpkg -Dhelp for a list of dpkg debug flag values;
    Type dpkg --force-help for a list of forcing options;
    Type dpkg-deb --help for help about manipulating *.deb files;

    Options marked [*] produce a lot of output - pipe it through 'less' or 'more' !
    Removing intermediate container b7001a082671
     ---> f4aa2a6d589d
    Step 7/14 : COPY azureml-environment-setup/mutated_conda_dependencies.yml azureml-environment-setup/mutated_conda_dependencies.yml
     ---> afa1dc0accd2
    Step 8/14 : RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_e1ed510e22efc0d217a9550a38dbf97c -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig
     ---> Running in 6eba33d5cc83
    /bin/sh: 1: conda: not found
    The command '/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_e1ed510e22efc0d217a9550a38dbf97c -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig' returned a non-zero code: 127
    2019/10/31 20:18:36 Container failed during run: acb_step_0. No retries remaining.
    failed to run step ID: acb_step_0: exit status 127

    Run ID: cd1h failed after 2m3s. Error: failed during run, err: exit status 1

    Thursday, October 31, 2019 8:54 PM

All replies

  • Hi,
    Can you please run the below code and share the output to check.
    Also if possible please share the link to the sample that you are trying.

    Sunday, November 3, 2019 4:03 AM
  • Hi @Ram-msft,

    Thanks! The documentation that I'm following are:


    Output from est.run_config is:

        "script": "mbnet_finetuning/test_bottlenecktensors_training.py",
        "arguments": [
        "target": "gpu",
        "framework": "Python",
        "communicator": "None",
        "maxRunDurationSeconds": null,
        "nodeCount": 1,
        "environment": {
            "name": null,
            "version": null,
            "environmentVariables": {
                "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
            "python": {
                "userManagedDependencies": false,
                "interpreterPath": "python",
                "condaDependenciesFile": null,
                "baseCondaEnvironment": null,
                "condaDependencies": {
                    "name": "project_environment",
                    "dependencies": [
                            "pip": [
                    "channels": [
            "docker": {
                "enabled": true,
                "baseImage": "tensorflow/tensorflow:1.12.0-gpu-py3",
                "baseDockerfile": null,
                "sharedVolumes": true,
                "gpuSupport": true,
                "shmSize": "2g",
                "arguments": [],
                "baseImageRegistry": {
                    "address": null,
                    "username": null,
                    "password": null
            "spark": {
                "repositories": [],
                "packages": [],
                "precachePackages": false
            "databricks": {
                "mavenLibraries": [],
                "pypiLibraries": [],
                "rcranLibraries": [],
                "jarLibraries": [],
                "eggLibraries": []
            "inferencingStackVersion": null
        "history": {
            "outputCollection": true,
            "snapshotProject": true,
            "directoriesToWatch": [
        "spark": {
            "configuration": {
                "spark.app.name": "Azure ML Experiment",
                "spark.yarn.maxAppAttempts": 1
        "hdi": {
            "yarnDeployMode": "cluster"
        "tensorflow": {
            "workerCount": 1,
            "parameterServerCount": 1
        "mpi": {
            "processCountPerNode": 1
        "dataReferences": {
            "workspacefilestore": {
                "dataStoreName": "workspacefilestore",
                "pathOnDataStore": null,
                "mode": "mount",
                "overwrite": false,
                "pathOnCompute": null
        "sourceDirectoryDataStore": null,
        "amlcompute": {
            "vmSize": null,
            "vmPriority": null,
            "retainCluster": false,
            "name": null,
            "clusterMaxNodeCount": 1

    • Edited by alpaca2000 Monday, November 4, 2019 9:47 PM
    Monday, November 4, 2019 9:47 PM
  • Hi,

    It looks like the tensorflow image doesn't have conda. If extra packages are added, conda to create a new environment and install those packages.

    If you want tesnsorflow, you can use the tensorflow estimator.



    Tuesday, November 5, 2019 6:07 AM
  • Thanks. I had previously tried to use the TensorFlow estimator but it didn't work either. Ran again and below is the estimator I used. The experiment failed "Run failed. AzureML compute job failed. Failed starting container..." I've attached the relevant parts of the error log. 


    from azureml.train.dnn import TensorFlow

    script_params = {
        '--data-folder': ds.as_mount()

    est = TensorFlow(source_directory='.',
                       script_params = script_params,



    error: None of TensorFlow, PyTorch, or MXNet plugins were built" 

    docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=410,driver<411 --pid=17365 /mnt/docker/overlay2/60d1c3d76508bd126b3db600de343896e617237a086fdf622da23f20a07eb506/merged]\\\\nnvidia-container-cli: requirement error: unsatisfied condition: driver >= 410\\\\n\\\"\"": unknown.
    2019-11-05T21:35:23Z Job environment preparation failed on Output: 

    Tuesday, November 5, 2019 10:00 PM
    • Proposed as answer by AzureML1256 Tuesday, November 19, 2019 3:23 AM
    Tuesday, November 19, 2019 3:23 AM