locked
Get Metadata in a foreach loop RRS feed

  • Question

  • Hi,

     

    I'm trying to pull in a lot of our IoT log file data and put it into a searchable format. I think this should be possible but I've hit a blocker with the get metadata task when in a foreach loop. Basically I have all the log files stored in blob storage and each device stores its logs in a 'folder', the folder name is a serial number of the device and then there are 100s of files within the folder.

     

    My first step is to use get metadata on the root folder asking for it to list all child items, then use a foreach loop to call another pipeline passing in the serial number as a pipeline parameter. In the second pipeline I have another blob storage dataset which has a parameter for the serial number and then in the file path dcis/logs/@dataset().serialNumber. The first step in the second pipeline is to run another get metadata to list all the files and then using a foreach loop work through the list of files, using get metadata again to get the last modified date of the file and then to copy the data into either cosmos or azure sql.

     

    The problem is with the first get metadata in the second pipeline and it complains "Wildcard in path is not supported in GetMetadata." Clearly I've done something stupid just havent a clue what that is.....

     

    This what the dataset looks like

    {

    "name": "oldWorldSerialNumber",

    "properties": {

    "linkedServiceName": {

    "referenceName": "OldWorldBlob",

    "type": "LinkedServiceReference"

    },

    "parameters": {

    "serialNumber": {

    "type": "string"

    }

    },

    "annotations": [],

    "type": "AzureBlob",

    "typeProperties": {

    "format": {

    "type": "TextFormat",

    "columnDelimiter": "\t",

    "rowDelimiter": "",

    "treatEmptyAsNull": true,

    "skipLineCount": 0,

    "firstRowAsHeader": false

    },

    "fileName": "",

    "folderPath": {

    "value": "dcis/logs/dataset().serialNumber",

    "type": "Expression"

    }

    }

    }

    }

    • Edited by RichJoiner Tuesday, July 23, 2019 8:34 AM
    Tuesday, July 23, 2019 8:29 AM

Answers

  • I did tried out this and I think I was able to the iterate the subfolder ( from the child pipeline ) . The child pipeline in this case is the pipeline will which look for the files in the sub-folder . So when I call the executepipline , i just pass the subfolder name as an input to the child pipeline . My AzureBlobStorage14 points to the \log and I pass the only the subfolder ( in your case it should be serialNumber ) and it works .

    Please do let me know how it goes .



        "name": "DelimitedText7",
        "properties": {
            "linkedServiceName": {
                "referenceName": "AzureBlobStorage14",
                "type": "LinkedServiceReference"
            },
            "parameters": {
                "childfolder": {
                    "type": "string"
                }
            },
            "annotations": [],
            "type": "DelimitedText",
            "typeProperties": {
                "location": {
                    "type": "AzureBlobStorageLocation",
                    "folderPath": {
                        "value": "@dataset().childfolder",
                        "type": "Expression"
                    },
                    "container": "log"
                },
                "columnDelimiter": ",",
                "escapeChar": "\\",
                "firstRowAsHeader": true,
                "quoteChar": "\""
            },
            "schema": []
        }
    }

    Thanks Himanshu

    • Marked as answer by RichJoiner Thursday, August 1, 2019 11:33 AM
    Wednesday, July 31, 2019 6:08 AM

All replies

  • Hello Rich , 

    You have similar scenario as mentioned here , you are having one extra execute pipeline activity .
    Please do let us know  if this helps .


    Thanks Himanshu

    Wednesday, July 24, 2019 9:48 PM
  • Sadly this doesnt help me. My first get metadata activity on the root folder in the container works fine and uses the child item field, it lists all of my serial numbers and iterates through them as expected in the for each loop. Its the next bit where I have a get metadata that uses a datastore which has a parameter in the file path that it the bit that seem to get me into trouble.

    The error seems to imply that it thinks dcis/logs/dataset().serialNumber is a wildcard. This step also should out put the child items field listing all the blob files in the serial numbers folder.


    • Edited by RichJoiner Thursday, July 25, 2019 12:12 PM
    Thursday, July 25, 2019 7:30 AM
  • I did tried out this and I think I was able to the iterate the subfolder ( from the child pipeline ) . The child pipeline in this case is the pipeline will which look for the files in the sub-folder . So when I call the executepipline , i just pass the subfolder name as an input to the child pipeline . My AzureBlobStorage14 points to the \log and I pass the only the subfolder ( in your case it should be serialNumber ) and it works .

    Please do let me know how it goes .



        "name": "DelimitedText7",
        "properties": {
            "linkedServiceName": {
                "referenceName": "AzureBlobStorage14",
                "type": "LinkedServiceReference"
            },
            "parameters": {
                "childfolder": {
                    "type": "string"
                }
            },
            "annotations": [],
            "type": "DelimitedText",
            "typeProperties": {
                "location": {
                    "type": "AzureBlobStorageLocation",
                    "folderPath": {
                        "value": "@dataset().childfolder",
                        "type": "Expression"
                    },
                    "container": "log"
                },
                "columnDelimiter": ",",
                "escapeChar": "\\",
                "firstRowAsHeader": true,
                "quoteChar": "\""
            },
            "schema": []
        }
    }

    Thanks Himanshu

    • Marked as answer by RichJoiner Thursday, August 1, 2019 11:33 AM
    Wednesday, July 31, 2019 6:08 AM
  • Thanks.

    This helped me work out that the dataset doesnt like concatenating a string and a parameter in the path. I created a variable in the pipeline and then I set it using a concat function that joins '/dcis/log/' with the pipeline parameter for the serial number. This variable is then used to set the complete path on the Dataset and it works.

    Thanks
    Thursday, August 1, 2019 11:38 AM