none
Regarding Partitions in Data lake Analytics Catalog tables

    Question

  • I am working on a scenario where I put same batch data into a partition named by value of the batch id. If there is failed scenario at anytime then i have remove all those partitions named with batch id value.

    For one particular audit table for which insert command is present in all my scripts that run in parallel so i have used the below commands in all scripts and then insert.

    Following creation script is executed once.

    CREATE TABLE IF NOT EXISTS dbo.audit_log (
        SurrKey        long,
        audit_Value          string,
        Batch_ID             string,
        Inserted_TS   DateTime,
    INDEX dboaudit
    CLUSTERED(SurrKey ASC)
    PARTITIONED BY (Batch_ID)
    DISTRIBUTED BY ROUND ROBIN
    ); 

    "ALTER TABLE dbo.audit_log ADD PARTITION IF NOT EXISTS PARTITION(@BatchID);"

    Following insert statements are present in all scripts.

    INSERT INTO dbo.audit_log
    ON INTEGRITY VIOLATION MOVE TO PARTITION ("00_Default_Partition")
    SELECT DISTINCT SurrKey,
        audit_Value,
        Batch_ID ,
        Inserted_TS
    FROM @ins;

    In my failed scenario when I drop the partitions all named with my current value of Batch_ID this particular table still have the data with that Batch_ID which is strange.

    When i check the view usql.partitions i see that this table has many partitions but that view provides only the guid so it is hard to get the value it is using.

    Then i tried "ALTER TABLE dbo.audit_log DROP PARTITION IF EXISTS PARTITION(@BatchID);" multiple times with same batch Id value and check usql partitions view I can see that partitions are actually getting dropped one after the other. My conclusion was the partitions were created properly with the value of Batch_ID but the questions are

    1. Why are there so many partitions?

    2. Why do I need to execute the same command multiple times to get rid on one partition I intended to drop.

    3. What is happening in the back ground in data lake analytics that i am missing here.

    Please help me fix this issue.

    Thanks,

    Varun

    Monday, August 13, 2018 10:35 AM

All replies

  • Hi Varun,

        As this needs some direct trouble shooting to identify the issue, you need to create support ticket with azure support engineers to take a look. 

    Monday, August 27, 2018 6:22 PM
    Moderator