locked
Pipeline Dependencies and different frequencies RRS feed

  • Question

  • Hi all,

    I’m stuck since yesterday with one simple test that I need to do in order to understand how to handle time slices dependencies and different frequencies. Running on a deadline here.

    Basically, I need to align a daily frequency with a monthly frequency in the following way:

    -          Pipeline runs and produces output daily.

    -          2 input datasets, one has a Daily availability and the other one a monthly availability

    -          Using the following ADF function combinations to align time slices:

    • "name": "ds_SearchLogMensual",
    • "startTime": "Date.AddDays(SliceStart, -Date.Day(SliceStart)+1)",
    • "endTime": "Date.AddMonths(Date.AddDays(SliceEnd, -Date.Day(SliceEnd)+1),1)"

    So, for the start time, I intend to get the 1<sup>st</sup> day of the Month and for the end time, the 1<sup>st</sup> day of next month.

    Now, when pipeline runs it gets stuck forever. Also, I can’t see whether the function expressions are correct or not (probably not). Is there a debugging tool I could use ?

    Attaching JSONs.

    Input Data Set 1:

    {

        "name": "ds_SearchLogDiario",

        "properties": {

            "published": false,

            "type": "AzureDataLakeStore",

            "linkedServiceName": "AzureDataLakeStoreLinkedService",

            "typeProperties": {

                "fileName": "SearchLogDiario.tsv",

                "folderPath": "datalake/input",

                "format": {

                    "type": "TextFormat",

                    "rowDelimiter": "\n",

                    "columnDelimiter": "\t"

                }

            },

            "availability": {

                "frequency": "Day",

                "interval": 1

            },

            "external": true,

            "policy": {}

        }

    }

    Input Data Set 2

    {

        "name": "ds_SearchLogMensual",

        "properties": {

            "published": false,

            "type": "AzureDataLakeStore",

            "linkedServiceName": "AzureDataLakeStoreLinkedService",

            "typeProperties": {

                "fileName": "SearchLogMensual.tsv",

                "folderPath": "datalake/input",

                "format": {

                    "type": "TextFormat",

                    "rowDelimiter": "\n",

                    "columnDelimiter": "\t"

                }

            },

            "availability": {

                "frequency": "Month",

                "interval": 1

            },

            "external": true,

            "policy": {}

        }

    }

    Pipeline

    {

        "name": "pl_JoinDiarioMensual",

        "id": "/subscriptions/a5c2196c-6a56-445b-881f-59ca6fd8c7ae/resourcegroups/DaViv-RG/providers/Microsoft.DataFactory/datafactories/ADFDaviv/datapipelines/pl_JoinDiarioMensual",

        "properties": {

            "description": "Join diario con dependencia fuentes diaria y mensual",

            "activities": [

                {

                    "type": "DataLakeAnalyticsU-SQL",

                    "typeProperties": {

                        "scriptPath": "scripts\\kona\\SearchLogProcessing.usql",

                        "scriptLinkedService": "StorageLinkedService",

                        "degreeOfParallelism": 10,

                        "priority": 100,

                        "parameters": {

                            "in": "/datalake/input/SearchLog.tsv",

                            "out": "/datalake/output/Result.tsv"

                        }

                    },

                    "inputs": [

                        {

                            "name": "ds_SearchLogDiario"

                        },

                        {

                            "name": "ds_SearchLogMensual",

                            "startTime": "Date.AddDays(SliceStart, -Date.Day(SliceStart)+1)",

                            "endTime": "Date.AddMonths(Date.AddDays(SliceEnd, -Date.Day(SliceEnd)+1),1)"

                        }

                    ],

                    "outputs": [

                        {

                            "name": "ds_EventsByRegionTable"

                        }

                    ],

                    "scheduler": {

                        "frequency": "Day",

                        "interval": 1

                    },

                    "name": "DataLakeAnalyticsUSqlActivityTemplate",

                    "linkedServiceName": "AzureDataLakeAnalyticsLinkedService"

                }

            ],

            "start": "2016-09-13T00:00:00Z",

            "end": "2016-09-13T01:00:00Z",

            "isPaused": false,

            "runtimeInfo": {

                "deploymentTime": "2016-09-16T17:00:48.7074122Z",

                "activePeriodSetTime": "2016-09-16T17:00:46.9186326Z",

                "pipelineState": "Running",

                "activityPeriods": {

                    "dataLakeAnalyticsUSqlActivityTemplate": {

                        "start": "2016-09-13T00:00:00Z",

                        "end": "2016-09-14T00:00:00Z"

                    }

                }

            },

            "id": "fb1a0959-98ed-4844-9915-b7344a0012a8",

            "provisioningState": "Succeeded",

            "hubName": "adfdaviv_hub",

            "pipelineMode": "Scheduled",

            "expirationTime": "5.00:00:00"

        }

    }

    Thank you in advance for your help!

    Friday, September 16, 2016 9:51 PM

Answers

  • I think I found the solution. I just needed to add property "style": "StartOfInterval" (actually I assumed that was the default but I was wrong) to respective availability sections on all DataSet JSONs and also scheduler section on the activity JSON.

    • Marked as answer by amihanov Saturday, September 17, 2016 9:58 PM
    Saturday, September 17, 2016 9:58 PM