none
External table partition by year and month RRS feed

  • Question

  • Can  someone confirm that this scenario is not supported by ADX? 

    I have a data lake folder structure that has folders named as follows:

    MyTimestamp_YYYY-MM=2000-01

    MyTimestamp_YYYY-MM=2000-02

    MyTimestamp_YYYY-MM=2000-03

    etc

    I want to create an external table like this:

    .create external table mytable (
    ['ID']:long,
    ['CITY']:string,
    ['STATE_CD']:string,
    ['POSTAL_CD']:string,
    ['CREATED_DT']:datetime,
    ['MODIFIED_DT']:datetime,
    ['MyTimestamp']:datetime
    )
    kind=adl
    partition by format_datetime="'MyTimestamp_YYYY_MM='yyyy'-'MM'" bin(['MyTimestamp'],1d) 
    //  bin(startofmonth(['MyTimestamp']),1 )
    dataformat=csv
    ( 
       h@'abfss://main@zzzzdatalake.dfs.core.windows.net/inbound/myStuff;****'   
    )
    

    The documentation says that the bin portion of the partition by clause must have a column name, and a timespan literal - I don't see any documentation for what those literals can be, but from what I can surmise, it is 1d, 1h, 1m (or any numeral), for binning by day, hour, or minute (not MONTH).  When I try 1M I get a parsing error. 

    This expression would give me a bin for each month, but it isn't allowed.

    bin(startofmonth(['MyTimestamp']),1 )

    Is it correct that the largest partition unit on a datetime column is the day? Are there plans to allow for partitioning at larger units? Or supporting more options for the bin function in the .create external table statement?

    Mike

    Tuesday, November 19, 2019 9:51 PM

All replies

  • Hi Mike,

    Apologies for the delay in responding to this forum post but I am in the process of finding an answer for you.

    Regards,

    Mike

    Monday, November 25, 2019 6:04 PM
    Owner
  • This Stack Overflow post should address your question: Azure Application Insights - Summarize by part of timestamp

    In Application Insights Analytics:
    
    By hour:
    
    requests 
     | summarize count() by bin(timestamp, 1h) 
    By day:
    
    requests 
     | summarize count() by bin(timestamp, 1d) 
    By month
    
    requests 
      | summarize count()  by bin(datepart("Month", timestamp), 1) 
    By year
    
    requests 
      | summarize count()  by bin(datepart("Year", timestamp), 1)

    Please let me know if this does not address your question.

    Regards,

    Mike

    Monday, November 25, 2019 6:33 PM
    Owner
  • No, this answer is not applicable to my question. This answer refers to using the bin function in a Kusto query, not for creating an external table.

    My question is specifically about creating external tables in ADX, and the "partition by" clause. 

    It appears, by the documentation, that the partition by clause only supports a bin function with a column name and a duration literal. 

    Wednesday, November 27, 2019 3:16 PM
  • Thank you for the feedback, Mike. Can you use the following as what you are looking for is supported.

    partition by format_datetime="yyyy/MM" bin(1d)

    The one Example in the documentation does iterate on the day of month + year but the above will also work.

    It has been explained to me as the following:

    "Even though "bin" function is required by the "partition by" clause, it's not really used in queries. This is mostly needed by the export scenario. To sum up - you can specify any bin, what's important is the datetime format."

    Please let me know if you have additional questions.

    Regards,

    Mike


    Wednesday, November 27, 2019 7:03 PM
    Owner
  • So you're saying I should try to use the format_datetime pattern for the month granularity, and specify the bin duration as 1d? 

    partition by format_datetime="yyyy/MM" bin(myTimestamp,1d)

    My original question was whether there was going to be support for bin sizes larger than the day grain. 

    "Is it correct that the largest partition unit on a datetime column is the day? Are there plans to allow for partitioning at larger units? Or supporting more options for the bin function in the .create external table statement?"

    Your suggestion above seems counter-intuitive, but I'll see if it works. 

    Mike

    Wednesday, November 27, 2019 7:20 PM
  • Hi Mike,

    Did this resolve your issue? 

    Thanks,

    Mike

    Friday, December 6, 2019 3:44 AM
    Owner