none
PARTITIONED BY (event_date) NOT working...

    Question

  • I'm trying to use partitions and buckets by a column which is DateTime.

    DECLARE @partition1 DateTime = new DateTime(2016, 11, 01, 00,00,00,00, DateTimeKind.Utc);
    DECLARE @partition2 DateTime = @partition1.AddDays(1);

    I Have this partitions, but ADL is only inserting data for data with event_date==2016-11-01 00:00:00.000

    The rest of data is ignored....

    Such as event_date==2016-11-01 01:02:00.000

    Also when I query the Table. The event_date column doesn't have any data...

    Trying to run the samples:

    Ambulance-3-2-CreatePartitionedTable

    Doesn't work local and on Azure...

    Is there any problem on my side? Or is it ADL Problem with current release?

    thank you in advance!

    Miguel

    Tuesday, January 31, 2017 11:48 AM

All replies

  • Can you please post comprehensive information about the queries you're running?
    Thursday, February 2, 2017 9:17 AM
  • Example:

     

    CREATE TABLE IF NOT EXISTS dbo.ReceivedData
    (
        IDReceivedData long?,
        IMEI string,
        IDLoadUnit int?,
        DTSensor DateTime?,
        Lat double?,
        Long double?,
        Speed double?,

        INDEX idx CLUSTERED (IDLoadUnit ASC) PARTITIONED BY (DTSensor) DISTRIBUTED BY HASH (IDLoadUnit)

    );

    ImportData:

    DECLARE @partition1 DateTime = new DateTime(2016, 11, 01, 00,00,00,00, DateTimeKind.Utc);
    DECLARE @partition2 DateTime = @partition1.AddDays(1);
    DECLARE @partition3 DateTime = @partition1.AddDays(2);
    DECLARE @partition4 DateTime = @partition1.AddDays(3);
    DECLARE @partition5 DateTime = @partition1.AddDays(4);
    DECLARE @partition6 DateTime = @partition1.AddDays(5);
    DECLARE @partition7 DateTime = @partition1.AddDays(6);
    DECLARE @partition8 DateTime = @partition1.AddDays(7);
    DECLARE @partition9 DateTime = @partition1.AddDays(8);
    DECLARE @partition10 DateTime = @partition1.AddDays(9);
    DECLARE @partition11 DateTime = @partition1.AddDays(10);
    DECLARE @partition12 DateTime = @partition1.AddDays(11);
    DECLARE @partition13 DateTime = @partition1.AddDays(12);
    DECLARE @partition14 DateTime = @partition1.AddDays(13);
    DECLARE @partition15 DateTime = @partition1.AddDays(14);
    DECLARE @partition16 DateTime = @partition1.AddDays(15);
    DECLARE @partition17 DateTime = @partition1.AddDays(16);
    DECLARE @partition18 DateTime = @partition1.AddDays(17);
    DECLARE @partition19 DateTime = @partition1.AddDays(18);
    DECLARE @partition20 DateTime = @partition1.AddDays(19);
    DECLARE @partition21 DateTime = @partition1.AddDays(20);
    DECLARE @partition22 DateTime = @partition1.AddDays(21);
    DECLARE @partition23 DateTime = @partition1.AddDays(22);
    DECLARE @partition24 DateTime = @partition1.AddDays(23);
    DECLARE @partition25 DateTime = @partition1.AddDays(24);
    DECLARE @partition26 DateTime = @partition1.AddDays(25);
    DECLARE @partition27 DateTime = @partition1.AddDays(26);
    DECLARE @partition28 DateTime = @partition1.AddDays(27);
    DECLARE @partition29 DateTime = @partition1.AddDays(28);
    DECLARE @partition30 DateTime = @partition1.AddDays(29);
    DECLARE @partition31 DateTime = @partition1.AddDays(30);
    ALTER TABLE Test.dbo.ReceivedData
    ADD IF NOT EXISTS PARTITION(@partition1), PARTITION(@partition2), PARTITION(@partition3), PARTITION(@partition4), PARTITION(@partition5), PARTITION(@partition6), PARTITION(@partition7), PARTITION(@partition8), PARTITION(@partition9)
    ,PARTITION(@partition10), PARTITION(@partition11), PARTITION(@partition12), PARTITION(@partition13), PARTITION(@partition14), PARTITION(@partition15), PARTITION(@partition16), PARTITION(@partition17), PARTITION(@partition18), PARTITION(@partition19),
    PARTITION(@partition20), PARTITION(@partition21), PARTITION(@partition22), PARTITION(@partition23), PARTITION(@partition24), PARTITION(@partition25), PARTITION(@partition26), PARTITION(@partition27), PARTITION(@partition28), PARTITION(@partition29);



    INSERT INTO TestDB.dbo.ReceivedData
    ON INTEGRITY VIOLATION IGNORE
    SELECT *
        FROM EXTERNAL TestDB.Archive201611 EXECUTE @"
                  SELECT TOP (10000) [IDReceivedData]
          ,[IMEI]
          ,[IDLoadUnit]
          ,[DTSensor]
          ,[Lat]
          ,[Long]
          ,[Speed]
      FROM [dbo].[ReceivedData_ARC]";

    Thursday, February 2, 2017 11:48 AM
  • Do you think this could be related to the use of an federated query? What if you copy the data to an intermediate ADL table first?
    Saturday, February 4, 2017 5:56 PM
  • Based on the information you have provided so far, the result appears to be by design.

    1) You have created a partition where DTSensor = x and attempt to insert data where DTSensor != x.
    2) You then instruct DL to ignore (ON INTEGRITY VIOLATION IGNORE) records where DTSensor != x.
    Thus you should not expect to see any records where DTSensor does not equal one of the partitions you have defined.

    Did you want to insert a record where DTSensor does not equal a partition you have defined?
    1) Consider using ON INTEGRITY VIOLATION MOVE TO PARTITION to move all such records to a "dumping" partition.
    2) Consider adding partitions that meet your expected data, for example, DECLARE @partition1 DateTime = new DateTime(2016, 11, 01, 01,02,00,00, DateTimeKind.Utc);
    3) Consider stripping the hour and minutes from DTSensor.

    Check out INSERT Statement (U-SQL) which contains an example for using ON INTEGRITY VIOLATION MOVE TO PARTITION.

    Wednesday, February 8, 2017 8:09 PM
  • In case you need to both preserve the full original timestamp AND to have daily partitions of your data, you can get the desired behavior by:

    - Creating the table with one additional column, which will be used for partitioning, e.g.

    CREATE TABLE IF NOT EXISTS dbo.ReceivedData

    (

        IDReceivedData long?,

        IMEI string,

        IDLoadUnit int?,

        DTSensor DateTime?,

        Lat double?,

        Long double?,

        Speed double?,

        SensorDate string,

        INDEX idx CLUSTERED (IDLoadUnit ASC)

        PARTITIONED BY (SensorDate) DISTRIBUTED BY  HASH (IDLoadUnit)

    );

    - Adding the partitions with string values which will represent a day, e.g.

    ALTER TABLE Test.dbo.ReceivedData
    ADD IF NOT EXISTS PARTITION("2016-11-01"), PARTITION("2016-11-02"), …

    - Populating the partitioning column during insert, e.g.

    @result = SELECT *
        FROM EXTERNAL TestDB.Archive201611 EXECUTE @"
                  SELECT TOP (10000) [IDReceivedData]
          ,[IMEI]
          ,[IDLoadUnit]
          ,[DTSensor]
          ,[Lat]
          ,[Long]
          ,[Speed]
      FROM [dbo].[ReceivedData_ARC]";

    INSERT INTO TestDB.dbo.ReceivedData ON INTEGRITY VIOLATION IGNORE

         SELECT *,

        (DTSensor).ToString("yyyy-MM-dd") AS SensorDate

    FROM @result

    ;

    Thursday, February 9, 2017 3:32 PM