none
Dynamic Pivot in U-Sql

    Question

  • I have been over the example code and would like to know the best way to achieve the following dynamic pivot. The spread columns are not known at runtime. All distinct tag_names need to be included as columns in final result. I am looking at using a custom Reducer or Processor. Any suggestions?

    //Simple example data

    @t = SELECT *
         FROM(
            VALUES
            ( 1, "2010-01-01","TagName1", 30 ),
            ( 2, "2010-01-01","TagName2", 5 ),
            ( 2, "2010-01-02","TagName2", 7 ),
            ( 2, "2010-01-02","TagName3", 6 ),
            ( 3, "2010-01-03","TagName2", 15 ),
            ( 1, "2010-01-01","TagName2", 2 ),
            ( 1, "2010-01-01","TagName3", 1),
            ( 3, "2010-01-04","TagName1", 2 ),
            ( 3, "2010-01-04","TagName2", 4 )
         ) AS T(DeviceID, Date, TagName, dv);

    //GROUPING on deviceid,date

    //SPREAD on Tagname

    //AGG on dv

    //Desired output

    DeviceID Date TagName1 TagName2 TagName3
    1 2010-01-01 agg
    agg agg
    2 2010-01-02 null agg agg
    2 2010-01-01 null agg null
    3 2010-01-03 null agg null
    3 2010-01-04 agg agg null

    Kyle Clubb




    • Edited by Kyle Clubb Monday, September 19, 2016 5:54 PM
    Monday, September 19, 2016 5:46 PM

Answers

  • You can use SqlMap<string, string> as a way to handle unknown key value pairs.

    Here is a sample I recently wrote to answer a similar question:

    @d = EXTRACT time string
               , device_id string
               , tag string
               , value string // could be int, long or decimal if values are always numeric
    FROM "/data.csv"
    USING Extractors.Csv(skipFirstNRows:1);
    
    @res = SELECT time, device_id, MAP_AGG(tag.Trim(), value) AS properties FROM @d GROUP BY time, device_id;
    
    // Now add cycle and pivot known/frequently queried properties out
    // you could do this with a processor if you want to "hide" the removal code from the properties.
    // see MapPivoter in https://github.com/Azure/usql/blob/master/Examples/AmbulanceDemos/AmbulanceDemos/3-Ambulance-User%20Code/3.2-Ambulance-UseUDOs.usql 
    // and https://github.com/Azure/usql/blob/master/Examples/AmbulanceDemos/AmbulanceDemoCode/Class1.cs
    
    @res =
        SELECT time
             , properties["RPM"] AS [RPM]
             , properties["TEMP"] AS [TEMP]
             , new SqlMap<string, string>(properties.Where(x => x.Key != "RPM" && x.Key != "TEMP")) AS properties
        FROM @res;
    
    // Now format the map for output into a CSV representation (note output cannot create a header row with keys from map, would need a custom outputter for that)
    // Assume no need to quote keys or values
    @res =
        SELECT time,
               [RPM],
               [TEMP],
               String.Join(",", properties.Select( p => p.Key + ":" + p.Value)) AS properties
        FROM @res;
    
    OUTPUT @res
    TO "/result.csv"
    USING Outputters.Csv(quoting:false);
    


    Michael Rys

    Wednesday, October 19, 2016 10:55 PM
    Moderator
  • The U-SQL language must know about all columns & schemas at compile time. As a result it's not possible to dynamically create columns like this where the columns would only be known by examining the data.

    The workaround is to split it into two scripts and use the SDK. First, one script outputs a file that contains the columns that are needed. Then a tool using the ADL SDK reads the file and then dynamically generates a new script that has the needed column names and then submits the second script.

    We realize this is more complicated than is desired and we are looking long-term into how to accomplish scenarios like this.

    Saturday, October 1, 2016 5:40 PM
    Moderator

All replies

  • The U-SQL language must know about all columns & schemas at compile time. As a result it's not possible to dynamically create columns like this where the columns would only be known by examining the data.

    The workaround is to split it into two scripts and use the SDK. First, one script outputs a file that contains the columns that are needed. Then a tool using the ADL SDK reads the file and then dynamically generates a new script that has the needed column names and then submits the second script.

    We realize this is more complicated than is desired and we are looking long-term into how to accomplish scenarios like this.

    Saturday, October 1, 2016 5:40 PM
    Moderator
  • You can use SqlMap<string, string> as a way to handle unknown key value pairs.

    Here is a sample I recently wrote to answer a similar question:

    @d = EXTRACT time string
               , device_id string
               , tag string
               , value string // could be int, long or decimal if values are always numeric
    FROM "/data.csv"
    USING Extractors.Csv(skipFirstNRows:1);
    
    @res = SELECT time, device_id, MAP_AGG(tag.Trim(), value) AS properties FROM @d GROUP BY time, device_id;
    
    // Now add cycle and pivot known/frequently queried properties out
    // you could do this with a processor if you want to "hide" the removal code from the properties.
    // see MapPivoter in https://github.com/Azure/usql/blob/master/Examples/AmbulanceDemos/AmbulanceDemos/3-Ambulance-User%20Code/3.2-Ambulance-UseUDOs.usql 
    // and https://github.com/Azure/usql/blob/master/Examples/AmbulanceDemos/AmbulanceDemoCode/Class1.cs
    
    @res =
        SELECT time
             , properties["RPM"] AS [RPM]
             , properties["TEMP"] AS [TEMP]
             , new SqlMap<string, string>(properties.Where(x => x.Key != "RPM" && x.Key != "TEMP")) AS properties
        FROM @res;
    
    // Now format the map for output into a CSV representation (note output cannot create a header row with keys from map, would need a custom outputter for that)
    // Assume no need to quote keys or values
    @res =
        SELECT time,
               [RPM],
               [TEMP],
               String.Join(",", properties.Select( p => p.Key + ":" + p.Value)) AS properties
        FROM @res;
    
    OUTPUT @res
    TO "/result.csv"
    USING Outputters.Csv(quoting:false);
    


    Michael Rys

    Wednesday, October 19, 2016 10:55 PM
    Moderator
  • I ran into issues when using the SqlMap function. Is there anything that I need to set up before using this function?

    C# error CS0308: The non-generic type 'Microsoft.Analytics.Types.Sql.SqlMap' cannot be used with type arguments

    Thank you so much!!

    Monday, December 19, 2016 9:43 PM
  • Michael or Saveen, any update to this question and response from last year on being able to dynamically construct the PIVOT?

    Thanks!


    Bill Blakey

    Monday, November 13, 2017 6:35 PM
  • I need to pivot some data where I will not know the pivot columns at run time. Has there been anymore progress in this area for USQL PIVOT? I need to pivot on Age in the following query: SELECT mbr_id, Age, Target_Pred FROM @predictions; Any help would be appreciated.

    Brian L Custer

    Thursday, November 30, 2017 2:00 PM